Python: request to website doesn't gives the html I need in any cases-CodePudding

Based on my question here I have some further question with requests on website finance.yahoo.com.

My request without User-Agent request gives me the html code I want to collect some data from the website.

The call with 'ALB' as parameter works fine, I get the requested data:

import bs4 as bs
import requests
def yahoo_summary_stats(stock):
    response = requests.get(f"https://finance.yahoo.com/quote/{stock}")
    #response = requests.get(f"https://finance.yahoo.com/quote/{stock}", headers={'User-Agent': 'Custom user agent'})

    soup = bs.BeautifulSoup(response.text, 'lxml')

    table = soup.find('p', {'class': 'D(ib) Va(t)'})

    sector = table.findAll('span')[1].text
    industry = table.findAll('span')[3].text
    print(f"{stock}: {sector}, {industry}")
    return sector, industry

web.yahoo_summary_stats('ALB')

Output:

ALB: Basic Materials, Specialty Chemicals

The call yahoo_summary_stats('AEE') doesnt work this way, so I need to acitivate headers to request the site with success.

But now with parameterheaders={'User-Agent': 'Custom user agent'} the code doesn't work and he cannot find the paragraph p with class 'D(ib) Va(t)'.

How can I solve this problem?

CodePudding user response：

I think you are fetching the wrong url

response = requests.get(f"https://finance.yahoo.com/quote/{stock}/profile?p={stock}", headers={'User-Agent': 'Custom user agent'})

Changing to above url along with user-agent would help you out.

CodePudding user response：

This page uses JavaScript to display information but requests,BeautifulSoup can't run JavaScript.

But checking page in web browser without JavaScript I see this information on subpage Profile.

"https://finance.yahoo.com/quote/{stock}/profile?p={stock}"

Code can get it for both stock from this page. But it needs User-Agent from real browser (or at least short version 'Mozilla/5.0'

import bs4 as bs
import requests

def yahoo_summary_stats(stock):

    url = f"https://finance.yahoo.com/quote/{stock}/profile?p={stock}"

    headers = {'User-Agent': 'Mozilla/5.0'}

    print('url:', url)
    
    response = requests.get(url, headers=headers)

    soup = bs.BeautifulSoup(response.text, 'lxml')

    table = soup.find('p', {'class': 'D(ib) Va(t)'})

    sector = table.findAll('span')[1].text
    industry = table.findAll('span')[3].text

    print(f"{stock}: {sector}, {industry}")

    return sector, industry

# --- main ---

result = yahoo_summary_stats('ALB')
print('result:', result)

result = yahoo_summary_stats('AEE')
print('result:', result)

Result:

url: https://finance.yahoo.com/quote/ALB/profile?p=ALB
ALB: Basic Materials, Specialty Chemicals
result: ('Basic Materials', 'Specialty Chemicals')

url: https://finance.yahoo.com/quote/AEE/profile?p=AEE
AEE: Utilities, Utilities—Regulated Electric
result: ('Utilities', 'Utilities—Regulated Electric')