I am scraping data from yahoo finance all data scraping is working fine. But when I want to store the appended list into an indexable dataframe it returns a blank dataframe however, when I store the data in a non-indexable dataframe it store the data.
When I print temp I can see the data even if I convert temp into a dataframe it gets converted successfully. But when I run financial_dir[ticker]=temp.append(soup.find('div', {'class' : "D(tbrg)"}).find_all('div')[i].get_text(separator='|').split('|'))
it does not create an indexable dataframe it runs an empty dataframe.
I want to create financial_dir like this which is callable for different stocks for example when I run financial_dir['INDUSINDBK.NS'] it should give the dataframe for INDUSINDBK.NS like the image. Any help will be extremely appreciated
'''
import requests
from bs4 import BeautifulSoup
import pandas as pd
tickers = ['KOTAKBANK.NS','WIPRO.NS','HINDALCO.NS','RELIANCE.NS',
'INDUSINDBK.NS','HDFCLIFE.NS','TATACONSUM.NS','TITAN.NS',
'ULTRACEMCO.NS']
financial_dir = pd.DataFrame()
temp = []
for ticker in tickers:
url = 'https://finance.yahoo.com/quote/' ticker '/financials?p=' ticker
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'})
#page_content = page.content
soup = BeautifulSoup(page.text, 'html.parser')
a = list(range(0,2000,1))
#while IndexError(True):
try:
for i in a:
financial_dir[ticker]=temp.append(soup.find('div', {'class' : "D(tbrg)"}).find_all('div')[i].get_text(separator='|').split('|'))
except:
pass
temp
data5 = pd.DataFrame(temp)
financial_dir
'''
CodePudding user response:
try this:
- create function to return one dataframe per ticker:
def f(ticker):
url = 'https://finance.yahoo.com/quote/' ticker '/financials?p=' ticker
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'})
soup = BeautifulSoup(page.text, 'html.parser')
ticker_header = [i.text for i in soup.find('div', {'class' : "D(tbhg)"}).find('div', {'class' : 'D(tbr)'}).find_all('div', {'class': 'D(ib)'})]
values = [i.text for i in soup.find('div', {'class' : "D(tbrg)"}).find_all('div', {'class': 'Ta(c)'})]
ticker_index = [i.text for i in soup.find('div', {'class' : "D(tbrg)"}).find_all('div', {'class': 'D(ib)'})]
chunk_size = 5
list_chunked = [values[i:i chunk_size] for i in range(0, len(values), chunk_size)]
df = pd.DataFrame(list_chunked, columns=ticker_header[1:])
df_index = pd.Index(ticker_index)
df = df.set_index(df_index)
df['ticker'] = ticker
df = df.reset_index()
return df
f('TATACONSUM.NS') #return dataframe
index ttm 3/31/2022 3/31/2021 3/31/2020 3/31/2019 ticker
0 Total Revenue 126,653,800 123,470,100 115,832,200 95,966,000 72,093,500 TATACONSUM.NS
1 Cost of Revenue 74,531,800 73,265,100 70,742,800 55,775,900 41,540,400 TATACONSUM.NS
2 Gross Profit 52,122,000 50,205,000 45,089,400 40,190,100 30,553,100 TATACONSUM.NS
3 Operating Expense 37,051,000 35,650,800 32,199,200 29,685,700 24,003,600 TATACONSUM.NS
#...
f('HINDALCO.NS') #return dataframe
index ttm 3/31/2022 3/31/2021 3/31/2020 3/31/2019 ticker
0 Total Revenue 2,104,160,000 1,937,560,000 1,310,090,000 1,171,400,000 1,297,455,700 HINDALCO.NS
1 Cost of Revenue 1,531,870,000 1,398,820,000 953,430,000 859,720,000 958,279,000 HINDALCO.NS
2 Gross Profit 572,290,000 538,740,000 356,660,000 311,680,000 339,176,700 HINDALCO.NS
3 Operating Expense 312,010,000 298,540,000 240,410,000 215,740,000 230,666,900 HINDALCO.NS
#...
- then you can save each ticket in separate csv file and work with each one separately:
tickers = ['KOTAKBANK.NS','WIPRO.NS','HINDALCO.NS','RELIANCE.NS',
'INDUSINDBK.NS','HDFCLIFE.NS','TATACONSUM.NS','TITAN.NS',
'ULTRACEMCO.NS']
for ticker in tickers:
f(ticker).to_csv(f'{ticker}.csv', index=False)
- or you can put them in one dataframe:
tickers = ['KOTAKBANK.NS','WIPRO.NS','HINDALCO.NS','RELIANCE.NS',
'INDUSINDBK.NS','HDFCLIFE.NS','TATACONSUM.NS','TITAN.NS',
'ULTRACEMCO.NS']
all_dataframes = []
for ticker in tickers:
print(ticker)
all_dataframes.append(f(all_dataframes))
df_all = pd.concat(all_dataframes)
- and you can also pivot the dataframe you got:
df_all.pivot(index='ticker', columns='index', values=[ 'ttm', '3/31/2022', '3/31/2021', '3/31/2020', '3/31/2019',])