Trouble scraping values from url link-CodePudding

I am new to web scraping and am trying to extract a value from Yahoo finance. I am using pandas and match to search for the right row of data amongst the tables with the following code:

  #Get 5 year growth estimate------------------------------------------------
    url_link = "https://finance.yahoo.com/quote/" str(STOCK) "/analysis?p=" str(STOCK) ""
    r = requests.get(url_link,headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
    read_html_pandas_data = pd.read_html(r.text,match = STOCK)
    table = read_html_pandas_data
    print(table)

I am passing in different STOCK strings, such as 'ABC'. I get a length 1 list of:

[           Growth Estimates     ABC  Industry  Sector(s)  S&P 500
0              Current Qtr.  19.00%       NaN        NaN      NaN
1                 Next Qtr.   7.90%       NaN        NaN      NaN
2              Current Year  18.40%       NaN        NaN      NaN
3                 Next Year   6.00%       NaN        NaN      NaN
4  Next 5 Years (per annum)  10.69%       NaN        NaN      NaN
5  Past 5 Years (per annum)   8.70%       NaN        NaN      NaN]

The value I want is 10.69%, but I am having trouble on how to extract it properly. I was using a different method before, but the order of the tables changes based on the stock URL, so I wanted to try this to be more consistent.

CodePudding user response：

I recommend setting the index as something that is accessible then accessing it using that. For example.

table.set_index('Growth Estimates', inplace=True)
table.loc['Next 5 Years (per annum)']

Note that right now you have a list of dataframes so you may want to do:

table = read_html_pandas_data[0]

CodePudding user response：

Try this:

# Make it clear what table you are interested in
table = pd.read_html(r.text, match="Growth Estimates")[0]

# Get the value you want
table.loc[table["Growth Estimates"] == "Next 5 Years (per annum)", STOCK]