Based on my question here I have some further question with requests on website finance.yahoo.com.
My request without User-Agent request gives me the html code I want to collect some data from the website.
The call with 'ALB' as parameter works fine, I get the requested data:
import bs4 as bs
import requests
def yahoo_summary_stats(stock):
response = requests.get(f"https://finance.yahoo.com/quote/{stock}")
#response = requests.get(f"https://finance.yahoo.com/quote/{stock}", headers={'User-Agent': 'Custom user agent'})
soup = bs.BeautifulSoup(response.text, 'lxml')
table = soup.find('p', {'class': 'D(ib) Va(t)'})
sector = table.findAll('span')[1].text
industry = table.findAll('span')[3].text
print(f"{stock}: {sector}, {industry}")
return sector, industry
web.yahoo_summary_stats('ALB')
Output:
ALB: Basic Materials, Specialty Chemicals
The call yahoo_summary_stats('AEE')
doesnt work this way, so I need to acitivate headers to request the site with success.
But now with parameterheaders={'User-Agent': 'Custom user agent'}
the code doesn't work and he cannot find the paragraph p with class 'D(ib) Va(t)'
.
How can I solve this problem?
CodePudding user response:
I think you are fetching the wrong url
response = requests.get(f"https://finance.yahoo.com/quote/{stock}/profile?p={stock}", headers={'User-Agent': 'Custom user agent'})
Changing to above url along with user-agent would help you out.
CodePudding user response:
This page uses JavaScript
to display information but requests
,BeautifulSoup
can't run JavaScript
.
But checking page in web browser without JavaScript I see this information on subpage Profile
.
"https://finance.yahoo.com/quote/{stock}/profile?p={stock}"
Code can get it for both stock
from this page. But it needs User-Agent
from real browser (or at least short version 'Mozilla/5.0'
import bs4 as bs
import requests
def yahoo_summary_stats(stock):
url = f"https://finance.yahoo.com/quote/{stock}/profile?p={stock}"
headers = {'User-Agent': 'Mozilla/5.0'}
print('url:', url)
response = requests.get(url, headers=headers)
soup = bs.BeautifulSoup(response.text, 'lxml')
table = soup.find('p', {'class': 'D(ib) Va(t)'})
sector = table.findAll('span')[1].text
industry = table.findAll('span')[3].text
print(f"{stock}: {sector}, {industry}")
return sector, industry
# --- main ---
result = yahoo_summary_stats('ALB')
print('result:', result)
result = yahoo_summary_stats('AEE')
print('result:', result)
Result:
url: https://finance.yahoo.com/quote/ALB/profile?p=ALB
ALB: Basic Materials, Specialty Chemicals
result: ('Basic Materials', 'Specialty Chemicals')
url: https://finance.yahoo.com/quote/AEE/profile?p=AEE
AEE: Utilities, Utilities—Regulated Electric
result: ('Utilities', 'Utilities—Regulated Electric')