I am scraping one website using beautifulsoup but output of html doesn't match with the one shown on page source through web browser. There are some tags missing. Below is my code:
URL = '<url>'
response = requests.get(URL, headers = header)
html_doc = BeautifulSoup(response, 'html.parser')
content = html_doc.find('div', attrs={'class':'content-wrapper'})
I am not sure what happening but it may related to eventlistener. I found it after this tag on page source.
CodePudding user response:
If the problem is caused by the eventlistener, I would suggest you to use beautifulsoup
along with selenium
to scrape this website. So, let apply selenium
at sending request and get back page source and then use beautifulsoup
to parse it.
Note that using selenium requires a browser driver. You might find via this link (https://www.selenium.dev/documentation/getting_started/installing_browser_drivers/).
The example of code using Firefox:
from selenium import webdriver
URL = '<url>'
browser = webdriver.Firefox()
browser.get(URL)
html_doc = BeautifulSoup(browser.page_source, 'html.parser')
time.sleep(1)
browser.close()
content = html_doc.find('div', attrs={'class':'content-wrapper'})
Hope this would help!