Home > Enterprise >  can't find html tag when I scrape web using beautifulsoup
can't find html tag when I scrape web using beautifulsoup

Time:12-06

I am scraping one website using beautifulsoup but output of html doesn't match with the one shown on page source through web browser. There are some tags missing. Below is my code:

URL = '<url>'
response = requests.get(URL, headers = header)
html_doc = BeautifulSoup(response, 'html.parser')
content = html_doc.find('div', attrs={'class':'content-wrapper'})

I am not sure what happening but it may related to eventlistener. I found it after this tag on page source.

CodePudding user response:

If the problem is caused by the eventlistener, I would suggest you to use beautifulsoup along with selenium to scrape this website. So, let apply selenium at sending request and get back page source and then use beautifulsoup to parse it.

Note that using selenium requires a browser driver. You might find via this link (https://www.selenium.dev/documentation/getting_started/installing_browser_drivers/).

The example of code using Firefox:

from selenium import webdriver

URL = '<url>'
browser = webdriver.Firefox()
browser.get(URL)
html_doc = BeautifulSoup(browser.page_source, 'html.parser')
time.sleep(1)
browser.close()

content = html_doc.find('div', attrs={'class':'content-wrapper'})

Hope this would help!

  • Related