How to make multiple columns in one column in pandas for the data appended from a list-CodePudding

I am scraping data from yahoo finance all data scraping is working fine. But when I want to store the appended list into an indexable dataframe it returns a blank dataframe however, when I store the data in a non-indexable dataframe it store the data.

When I print temp I can see the data even if I convert temp into a dataframe it gets converted successfully. But when I run financial_dir[ticker]=temp.append(soup.find('div', {'class' : "D(tbrg)"}).find_all('div')[i].get_text(separator='|').split('|')) it does not create an indexable dataframe it runs an empty dataframe.

I want to create financial_dir like this which is callable for different stocks for example when I run financial_dir['INDUSINDBK.NS'] it should give the dataframe for INDUSINDBK.NS like the image. Any help will be extremely appreciated

'''

import requests
from bs4 import BeautifulSoup
import pandas as pd

tickers = ['KOTAKBANK.NS','WIPRO.NS','HINDALCO.NS','RELIANCE.NS',
           'INDUSINDBK.NS','HDFCLIFE.NS','TATACONSUM.NS','TITAN.NS',
           'ULTRACEMCO.NS']

financial_dir = pd.DataFrame()
temp = []
for ticker in tickers:
    url = 'https://finance.yahoo.com/quote/' ticker '/financials?p=' ticker
    page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'})
#page_content = page.content
    soup = BeautifulSoup(page.text, 'html.parser')
    a = list(range(0,2000,1))
#while IndexError(True):
    try:
        for i in a:
            financial_dir[ticker]=temp.append(soup.find('div', {'class' : "D(tbrg)"}).find_all('div')[i].get_text(separator='|').split('|'))
    except:
        pass


temp
data5 = pd.DataFrame(temp)
financial_dir

'''

CodePudding user response：

try this:

create function to return one dataframe per ticker:

def f(ticker):
    url = 'https://finance.yahoo.com/quote/' ticker '/financials?p=' ticker
    page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'})
    soup = BeautifulSoup(page.text, 'html.parser')
    ticker_header = [i.text for i in soup.find('div', {'class' : "D(tbhg)"}).find('div', {'class' : 'D(tbr)'}).find_all('div', {'class': 'D(ib)'})]
    values = [i.text for i in soup.find('div', {'class' : "D(tbrg)"}).find_all('div', {'class': 'Ta(c)'})]
    ticker_index = [i.text for i in soup.find('div', {'class' : "D(tbrg)"}).find_all('div', {'class': 'D(ib)'})]
    chunk_size = 5
    list_chunked = [values[i:i   chunk_size] for i in range(0, len(values), chunk_size)]
    df = pd.DataFrame(list_chunked, columns=ticker_header[1:])
    df_index = pd.Index(ticker_index)
    df = df.set_index(df_index)
    df['ticker'] = ticker
    df = df.reset_index()
    return df


f('TATACONSUM.NS') #return dataframe
    index               ttm         3/31/2022   3/31/2021   3/31/2020   3/31/2019   ticker
0   Total Revenue       126,653,800 123,470,100 115,832,200 95,966,000  72,093,500  TATACONSUM.NS
1   Cost of Revenue     74,531,800  73,265,100  70,742,800  55,775,900  41,540,400  TATACONSUM.NS
2   Gross Profit        52,122,000  50,205,000  45,089,400  40,190,100  30,553,100  TATACONSUM.NS
3   Operating Expense   37,051,000  35,650,800  32,199,200  29,685,700  24,003,600  TATACONSUM.NS
#...

f('HINDALCO.NS') #return dataframe
    index               ttm             3/31/2022   3/31/2021   3/31/2020   3/31/2019   ticker
0   Total Revenue       2,104,160,000   1,937,560,000   1,310,090,000   1,171,400,000   1,297,455,700   HINDALCO.NS
1   Cost of Revenue     1,531,870,000   1,398,820,000   953,430,000 859,720,000 958,279,000 HINDALCO.NS
2   Gross Profit        572,290,000     538,740,000 356,660,000 311,680,000 339,176,700 HINDALCO.NS
3   Operating Expense   312,010,000     298,540,000 240,410,000 215,740,000 230,666,900 HINDALCO.NS
#...

then you can save each ticket in separate csv file and work with each one separately:

tickers = ['KOTAKBANK.NS','WIPRO.NS','HINDALCO.NS','RELIANCE.NS',
           'INDUSINDBK.NS','HDFCLIFE.NS','TATACONSUM.NS','TITAN.NS',
           'ULTRACEMCO.NS']

for ticker in tickers:
    f(ticker).to_csv(f'{ticker}.csv', index=False)

or you can put them in one dataframe:

tickers = ['KOTAKBANK.NS','WIPRO.NS','HINDALCO.NS','RELIANCE.NS',
           'INDUSINDBK.NS','HDFCLIFE.NS','TATACONSUM.NS','TITAN.NS',
           'ULTRACEMCO.NS']

all_dataframes = []
for ticker in tickers:
    print(ticker)
    all_dataframes.append(f(all_dataframes))

df_all = pd.concat(all_dataframes)

and you can also pivot the dataframe you got:

df_all.pivot(index='ticker', columns='index', values=[ 'ttm', '3/31/2022', '3/31/2021', '3/31/2020', '3/31/2019',])