Home > Blockchain >  Scraped Python results have changed from numbers to NaN
Scraped Python results have changed from numbers to NaN


the last time I ran this code in February, it gave me proper results like this.

 Sales     Income

AAPL 365.82B 94.68B

MSFT 184.90B 71.19B

TSLA 53.82B 5.52B

FB 112.33B 40.30B

Now I get this with NaN instead of the numbers. The Finviz website looks to be using the exact same table as back in February. Can anyone figure out what has changed? Thanks.

 Sales     Income

AAPL 365.82B 94.68B




import pandas as pd
from bs4 import BeautifulSoup as bs
import requests
import numpy as np

# For custom list of stocks, edit this list below, otherwise leave commented out
v1 = ['AAPL','MSFT','TSLA','FB','BRK-B','TSM','NVDA','V','JNJ','JPM','WMT','PG','BAC','HD','BABA','TM','XOM','PFE','DIS','KO']
# Header required to scrape from Finviz
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
           'Upgrade-Insecure-Requests': '1', 'Cookie': 'v2=1495343816.', 'Accept-Encoding': 'gzip, deflate, sdch',
           'Referer': "http://finviz.com/quote.ashx?t="}
# This function is what is used to find the metric of interest and return it
def fundamental_metric(soup, metric):
    return soup.find(text=metric).find_next(class_='snapshot-td2').text
# This function iterates through the index of the data frame (stock_list) and uses the fundemental_metric functinon to find the metric on Finviz for that stock
# Any stock in the list that cannot be scraped will return an error before moving on to the next stock
def get_fundamental_data(df):
    for symbol in df.index:
            #url = ("http://finviz.com/quote.ashx?t="   symbol.lower())
            r = requests.get("http://finviz.com/quote.ashx?t="  symbol.lower(),headers=headers)
            soup = bs(r.content,'html.parser')
            for m in df.columns:
                output = fundamental_metric(soup,m)
                df.loc[symbol,m] = output
                df.replace(['-'], np.NaN)
        except Exception as e:
            print (symbol, 'Not Found')
        return df
# List of metrics to scrape
# Before adding any metrics, ensure the metric being added is available on Finviz and the name is matched identically
metric = ['Sales','Income']
df = pd.DataFrame(index = v1, columns = metric)
df = get_fundamental_data(df)

CodePudding user response:

Your code was running for the first symbol only

def get_fundamental_data(df):
    for symbol in df.index:
            # url = ("http://finviz.com/quote.ashx?t="   symbol.lower())
            r = requests.get("http://finviz.com/quote.ashx?t="   symbol.lower(), headers=headers)
            soup = bs(r.content, 'html.parser')
            for m in df.columns:
                output = fundamental_metric(soup, m)
                df.loc[symbol, m] = output
                df.replace(['-'], np.NaN)
        except Exception as e:
            print(symbol, 'Not Found')
    return df #removed One tab space
  • Related