I am new to web scraping and am trying to extract a value from Yahoo finance. I am using pandas and match to search for the right row of data amongst the tables with the following code:
#Get 5 year growth estimate------------------------------------------------
url_link = "https://finance.yahoo.com/quote/" str(STOCK) "/analysis?p=" str(STOCK) ""
r = requests.get(url_link,headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
read_html_pandas_data = pd.read_html(r.text,match = STOCK)
table = read_html_pandas_data
print(table)
I am passing in different STOCK strings, such as 'ABC'. I get a length 1 list of:
[ Growth Estimates ABC Industry Sector(s) S&P 500
0 Current Qtr. 19.00% NaN NaN NaN
1 Next Qtr. 7.90% NaN NaN NaN
2 Current Year 18.40% NaN NaN NaN
3 Next Year 6.00% NaN NaN NaN
4 Next 5 Years (per annum) 10.69% NaN NaN NaN
5 Past 5 Years (per annum) 8.70% NaN NaN NaN]
The value I want is 10.69%, but I am having trouble on how to extract it properly. I was using a different method before, but the order of the tables changes based on the stock URL, so I wanted to try this to be more consistent.
CodePudding user response:
I recommend setting the index as something that is accessible then accessing it using that. For example.
table.set_index('Growth Estimates', inplace=True)
table.loc['Next 5 Years (per annum)']
Note that right now you have a list of dataframes so you may want to do:
table = read_html_pandas_data[0]
CodePudding user response:
Try this:
# Make it clear what table you are interested in
table = pd.read_html(r.text, match="Growth Estimates")[0]
# Get the value you want
table.loc[table["Growth Estimates"] == "Next 5 Years (per annum)", STOCK]