I have been trying to scrape a web page using bs4, however, the HTML doesn't seem to match what I can see when using the 'view page source' in Chrome. As a novice in this area, any guidance on this would be much appreciated! Details below:
An example of a target web page here and the code used is shown below.
import requests
from bs4 import BeautifulSoup
my_url = 'https://finance.yahoo.com/m/63c37511-b114-3718-a601-7e898a22439e/a-big-tech-encore-and-twitter.html'
response = requests.get(my_url)
doc = BeautifulSoup(response.text, "html.parser")
with open("output1.html", "w") as file:
file.write(str(doc))
When viewing the page source in my browser (Chrome), the snippet below is included in the html:
"siteAttribute":"ticker=\"GOOGL;AAPL;PYPL;TWTR\"
However, when looking at the file output from the code above, the siteAttribute
has changed and no longer has the same information. Instead, it shows:
"siteAttribute":"wiki_topics=\"Big_Tech;Apple_Inc.;Facebook;
After researching online I can't figure out what is causing the discrepancy? Thanks in advance.
CodePudding user response:
If you click on inspect from pop up box tab of chrome devtools then press ctrl F and paste siteAttribute":"ticker=\"GOOGL;AAPL;PYPL;TWTR\
then you will see that the desired result is under a script tag. Please see the screenshot from here