Home > Enterprise >  Beautiful Soup and <script> tag - Function data
Beautiful Soup and <script> tag - Function data

Time:02-16

I'm creating a python script to look at web scrapping. I can extract most the text I need within div or class but am having an issue with tag.

url1 = "https://www.wowhead.com/"
soup1 = BeautifulSoup(html1, "html.parser")
test = soup1.find_all('div', attrs={"featured-content-block type-today-in-wow today-in-wow"})[0].find_all('script')[-1]

this returns: <script>new WH.News.TodayInWoW([{"assaults":{"duration":302400,"expansion":8,...."zone":10288}}}]);</script>

What I need is in the WH.News.TodayInWoW function (not sure what part of script to call it, emboldened) but I don't know how to extract this. Either a list or dictionary, I plan to filter out the keys/values I want.

Anything would be appreciated, I have already looked at a few other BS and email extraction but doesnt work for me

CodePudding user response:

You can try with the substring (notice the [31: -11] at the end):

test = soup1.find_all('div', attrs={"featured-content-block type-today-in-wow today-in-wow"})[0].find_all('script')[-1][31: -11]

To get the actual dictionary, eval the string:

d = eval(test)

CodePudding user response:

If I'm understanding this correctly you cannot web-scrape a function. What you are doing is just web scraping the parameters of the function. Assuming even if the website has made the script public, you'd need to go to inspect element > sources > script and then try to find the js file where that function is!

  • Related