Home > Blockchain >  Webscraper Only Obtains Some Data
Webscraper Only Obtains Some Data

Time:08-12

I want to scrape data off of this website:

enter image description here

The normal ratios are correctly obtained, but the GF Value Rank, the Financial Strenght and Profitability Rank couldn't be obtained with the program above.

I inspected the webcode of the gurufocus website and came across this div section with the id="financial-strength" and id="profitability", but I'm not sure how to extract the scores from this information.

As far as the GF Value Rank is concerned, I only found a span section that covers that score, but nothing like a javascript td entry or something similar.

How do I need to change my code to obtain the last three scores in my table?

CodePudding user response:

The GF Score in the ('span', class_='t-primary el-popover__reference') which in the div ('div', class_='flex flex-center justify-between'). If you're unable to get the exact div&class then you can go through like div.div.span.

or something like this

for div in soup.find('div', class_='flex flex-center justify-between'):
    for span in div.findAll('span', class_='t-primaryelpopover__reference')
          var.append.span.content

CodePudding user response:

This is a very complex page and you should use css selectors (or even better off, probably, because I haven't tried for this - xpath with lxml).

So try it this way:

ls=['Ticker','Debt-to-EBITDA','Gross Margin %','PE Ratio','Financial Strength','Profitability Rank','GF Value Rank']
symbols = ['AAPL', 'TSLA']
rows = []
for t in symbols:
    req = requests.get("https://www.gurufocus.com/stock/" t)
    if req.status_code !=200:
        continue
    soup = BeautifulSoup(req.content, 'html.parser')
    scores = [t]

    #this is where the css selectors come in - you have to use 4 of these
    #because this is how the data is distributed in the page

    measures = [mea.text.strip() for mea in soup.select('td.t-caption > a')]
    vals = [va.text.strip() for va in soup.select('td.t-caption span.p-l-sm') ]
    measures2 = [mea.text.strip() for mea in soup.select('h2.t-h6 >a')]
    vals2 = [va.text.strip() for va in soup.select('div.flex.flex-center span.t-body-sm.m-l-md')]

    all_meas = measures   measures2
    all_vals = vals   vals2

    for m,v in zip(all_meas,all_vals):
        if m in ls:
            scores.append(v)
    rows.append(scores)
df = pd.DataFrame(rows,columns=ls)
df

Output:

    Ticker  Debt-to-EBITDA  Gross Margin %  PE Ratio    Financial Strength  Profitability Rank  GF Value Rank
0   AAPL    0.91            43.31           27.93        7/10       10/10   6/10
1   TSLA    0.47            27.1            106.39       8/10        5/10   9/10
  • Related