I'm creating a python script to look at web scrapping. I can extract most the text I need within div or class but am having an issue with tag.
url1 = "https://www.wowhead.com/"
soup1 = BeautifulSoup(html1, "html.parser")
test = soup1.find_all('div', attrs={"featured-content-block type-today-in-wow today-in-wow"})[0].find_all('script')[-1]
this returns: <script>new WH.News.TodayInWoW([{"assaults":{"duration":302400,"expansion":8,...."zone":10288}}}]);</script>
What I need is in the WH.News.TodayInWoW function (not sure what part of script to call it, emboldened) but I don't know how to extract this. Either a list or dictionary, I plan to filter out the keys/values I want.
Anything would be appreciated, I have already looked at a few other BS and email extraction but doesnt work for me
CodePudding user response:
You can try with the substring (notice the [31: -11]
at the end):
test = soup1.find_all('div', attrs={"featured-content-block type-today-in-wow today-in-wow"})[0].find_all('script')[-1][31: -11]
To get the actual dictionary, eval the string:
d = eval(test)
CodePudding user response:
If I'm understanding this correctly you cannot web-scrape a function. What you are doing is just web scraping the parameters of the function. Assuming even if the website has made the script public, you'd need to go to inspect element > sources > script and then try to find the js file where that function is!