I want to scrape data off of this website:
The normal ratios are correctly obtained, but the GF Value Rank, the Financial Strenght and Profitability Rank couldn't be obtained with the program above.
I inspected the webcode of the gurufocus website and came across this div section with the id="financial-strength" and id="profitability", but I'm not sure how to extract the scores from this information.
As far as the GF Value Rank is concerned, I only found a span section that covers that score, but nothing like a javascript td entry or something similar.
How do I need to change my code to obtain the last three scores in my table?
CodePudding user response:
The GF Score in the ('span', class_='t-primary el-popover__reference')
which in the div ('div', class_='flex flex-center justify-between')
. If you're unable to get the exact div&class then you can go through like div.div.span
.
or something like this
for div in soup.find('div', class_='flex flex-center justify-between'):
for span in div.findAll('span', class_='t-primaryelpopover__reference')
var.append.span.content
CodePudding user response:
This is a very complex page and you should use css selectors (or even better off, probably, because I haven't tried for this - xpath with lxml).
So try it this way:
ls=['Ticker','Debt-to-EBITDA','Gross Margin %','PE Ratio','Financial Strength','Profitability Rank','GF Value Rank']
symbols = ['AAPL', 'TSLA']
rows = []
for t in symbols:
req = requests.get("https://www.gurufocus.com/stock/" t)
if req.status_code !=200:
continue
soup = BeautifulSoup(req.content, 'html.parser')
scores = [t]
#this is where the css selectors come in - you have to use 4 of these
#because this is how the data is distributed in the page
measures = [mea.text.strip() for mea in soup.select('td.t-caption > a')]
vals = [va.text.strip() for va in soup.select('td.t-caption span.p-l-sm') ]
measures2 = [mea.text.strip() for mea in soup.select('h2.t-h6 >a')]
vals2 = [va.text.strip() for va in soup.select('div.flex.flex-center span.t-body-sm.m-l-md')]
all_meas = measures measures2
all_vals = vals vals2
for m,v in zip(all_meas,all_vals):
if m in ls:
scores.append(v)
rows.append(scores)
df = pd.DataFrame(rows,columns=ls)
df
Output:
Ticker Debt-to-EBITDA Gross Margin % PE Ratio Financial Strength Profitability Rank GF Value Rank
0 AAPL 0.91 43.31 27.93 7/10 10/10 6/10
1 TSLA 0.47 27.1 106.39 8/10 5/10 9/10