Home > Enterprise >  Beautiful Soup HTML Not Matching 'View Page Source' in Browser
Beautiful Soup HTML Not Matching 'View Page Source' in Browser

Time:04-28

I have been trying to scrape a web page using bs4, however, the HTML doesn't seem to match what I can see when using the 'view page source' in Chrome. As a novice in this area, any guidance on this would be much appreciated! Details below:

An example of a target web page here and the code used is shown below.

import requests
from bs4 import BeautifulSoup

my_url = 'https://finance.yahoo.com/m/63c37511-b114-3718-a601-7e898a22439e/a-big-tech-encore-and-twitter.html'
response = requests.get(my_url)
doc = BeautifulSoup(response.text, "html.parser")

with open("output1.html", "w") as file:
    file.write(str(doc))

When viewing the page source in my browser (Chrome), the snippet below is included in the html:

"siteAttribute":"ticker=\"GOOGL;AAPL;PYPL;TWTR\"

However, when looking at the file output from the code above, the siteAttribute has changed and no longer has the same information. Instead, it shows:

"siteAttribute":"wiki_topics=\"Big_Tech;Apple_Inc.;Facebook;

After researching online I can't figure out what is causing the discrepancy? Thanks in advance.

CodePudding user response:

If you click on inspect from pop up box tab of chrome devtools then press ctrl F and paste siteAttribute":"ticker=\"GOOGL;AAPL;PYPL;TWTR\ then you will see that the desired result is under a script tag. Please see the screenshot from here

  • Related