Home > Blockchain >  groupby and apply in pandas
groupby and apply in pandas

Time:05-29

The purpose I would like to achieve: calculate the volume weighted daily return (formula is volume * daily return / cumulative volume per ticker), since this should be per ticker, I used the groupby ticker and then date, Here is the code I have right now.

stock_data['VWDR'] = stock_data.groupby(['Ticker','Date'])[['Volume', 'DailyReturn']].sum().apply(lambda df: df['Volume']*df['DailyReturn']/ df['Volume'].cumsum())

Here's the error message

KeyError: 'Volume'

Below is to get the test data

import pandas as pd
import yfinance as yf
# now just read the html to get all the S&P500 tickers 
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S&P_500_companies')
df = dataload[0]
# now get the first column(tickers) from the above data
# convert it into a list
ticker_list = df['Symbol'][25:35].values.tolist()
all_tickers = " ".join(ticker_list)
# get all the tickers from yfinance
tickers = yf.Tickers(all_tickers)
# set a start and end date to get two-years info
# group by the ticker
hist = tickers.history(start='2020-05-01', end='2022-05-01', group_by='ticker')
stock_data = pd.DataFrame(hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'}))
stock_data['DailyReturn'] = stock_data.sort_values(['Ticker', 'Date']).groupby('Ticker')['Close'].pct_change()

If I extract the ticker from the stock data table, it works fine as below:

AMZN = stock_data[stock_data.Ticker=='AMZN'].copy()
AMZN['VWDR'] = AMZN['Volume'] * AMZN['DailyReturn']/ AMZN['Volume'].cumsum()

But I am not sure what I've done wrong in the groupby codes, or is there any other simpler ways to achieve the purpose?

CodePudding user response:

Created the function 'func_data', which performs calculations. The result is placed in the 'test' column, which was previously created with nan values.

stock_data['test'] = np.nan

def func_data(x):
    x['test'] = x['Volume'] * x['DailyReturn'] / x['Volume'].cumsum()

    return x

stock_data['test'] = stock_data.groupby(['Ticker']).apply(func_data).iloc[:, -1]
print(AMZN)
print(stock_data)

Output

         Date Ticker        Close  ...    Volume  DailyReturn      test
0  2022-02-28   GOOG  2697.820068  ...   1483800          NaN       NaN
1  2022-02-28     MO    50.422642  ...   8646400          NaN       NaN
2  2022-03-01   GOOG  2683.360107  ...   1232000    -0.005360 -0.002431
3  2022-03-01     MO    50.697903  ...   9693000     0.005459  0.002885
4  2022-03-02   GOOG  2695.030029  ...   1198300     0.004349  0.001331
..        ...    ...          ...  ...       ...          ...       ...
83 2022-04-27     MO    54.919998  ...   7946600     0.000729  0.000015
84 2022-04-28   GOOG  2388.229980  ...   1839500     0.038176  0.001172
85 2022-04-28     MO    55.200001  ...   8153900     0.005098  0.000106
86 2022-04-29   GOOG  2299.330078  ...   1683500    -0.037224 -0.001017
87 2022-04-29     MO    55.570000  ...  10861600     0.006703  0.000180

CodePudding user response:

Add this.

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

Before this.

dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S&P_500_companies')

I did that, and got this result.

           Date Ticker        Close  ...  Stock Splits    Volume  DailyReturn
0    2020-05-01    AAL    10.640000  ...             0  99441400          NaN
1    2020-05-01    AEE    67.797997  ...             0   1520200          NaN
2    2020-05-01    AEP    75.347603  ...             0   2742100          NaN
3    2020-05-01   AMCR     7.925522  ...             0   4097600          NaN
4    2020-05-01    AMD    49.880001  ...             0  69562700          NaN
        ...    ...          ...  ...           ...       ...          ...
5035 2022-04-29    AMT   241.020004  ...             0   2151900    -0.044254
5036 2022-04-29   AMZN  2485.629883  ...             0  13616500    -0.140494
5037 2022-04-29    AXP   174.710007  ...             0   3210100    -0.039949
5038 2022-04-29   GOOG  2299.330078  ...             0   1683500    -0.037224
5039 2022-04-29     MO    55.570000  ...             0  10861600     0.006703

[5040 rows x 10 columns]

Then.

stock_data['VWDR'] = (stock_data['DailyReturn'].cumsum()*stock_data['Volume'].cumsum())/stock_data['Volume'].cumsum()

stock_data['VWDR']

Result.

enter image description here

Reference.

https://analyzingalpha.com/vwap

All code.

import pandas as pd
import yfinance as yf

import ssl
ssl._create_default_https_context = ssl._create_unverified_context


# now just read the html to get all the S&P500 tickers 
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S&P_500_companies')
df = dataload[0]
# now get the first column(tickers) from the above data
# convert it into a list
ticker_list = df['Symbol'][25:35].values.tolist()
all_tickers = " ".join(ticker_list)
# get all the tickers from yfinance
tickers = yf.Tickers(all_tickers)
# set a start and end date to get two-years info
# group by the ticker
hist = tickers.history(start='2020-05-01', end='2022-05-01', group_by='ticker')
stock_data = pd.DataFrame(hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'}))
stock_data['DailyReturn'] = stock_data.sort_values(['Ticker', 'Date']).groupby('Ticker')['Close'].pct_change()

stock_data['VWDR'] = (stock_data['DailyReturn'].cumsum()*stock_data['Volume'].cumsum())/stock_data['Volume'].cumsum()

stock_data['VWDR']
  • Related