The purpose I would like to achieve: calculate the volume weighted daily return (formula is volume * daily return / cumulative volume per ticker), since this should be per ticker, I used the groupby ticker and then date, Here is the code I have right now.
stock_data['VWDR'] = stock_data.groupby(['Ticker','Date'])[['Volume', 'DailyReturn']].sum().apply(lambda df: df['Volume']*df['DailyReturn']/ df['Volume'].cumsum())
Here's the error message
KeyError: 'Volume'
Below is to get the test data
import pandas as pd
import yfinance as yf
# now just read the html to get all the S&P500 tickers
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S&P_500_companies')
df = dataload[0]
# now get the first column(tickers) from the above data
# convert it into a list
ticker_list = df['Symbol'][25:35].values.tolist()
all_tickers = " ".join(ticker_list)
# get all the tickers from yfinance
tickers = yf.Tickers(all_tickers)
# set a start and end date to get two-years info
# group by the ticker
hist = tickers.history(start='2020-05-01', end='2022-05-01', group_by='ticker')
stock_data = pd.DataFrame(hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'}))
stock_data['DailyReturn'] = stock_data.sort_values(['Ticker', 'Date']).groupby('Ticker')['Close'].pct_change()
If I extract the ticker from the stock data table, it works fine as below:
AMZN = stock_data[stock_data.Ticker=='AMZN'].copy()
AMZN['VWDR'] = AMZN['Volume'] * AMZN['DailyReturn']/ AMZN['Volume'].cumsum()
But I am not sure what I've done wrong in the groupby codes, or is there any other simpler ways to achieve the purpose?
CodePudding user response:
Created the function 'func_data', which performs calculations. The result is placed in the 'test' column, which was previously created with nan values.
stock_data['test'] = np.nan
def func_data(x):
x['test'] = x['Volume'] * x['DailyReturn'] / x['Volume'].cumsum()
return x
stock_data['test'] = stock_data.groupby(['Ticker']).apply(func_data).iloc[:, -1]
print(AMZN)
print(stock_data)
Output
Date Ticker Close ... Volume DailyReturn test
0 2022-02-28 GOOG 2697.820068 ... 1483800 NaN NaN
1 2022-02-28 MO 50.422642 ... 8646400 NaN NaN
2 2022-03-01 GOOG 2683.360107 ... 1232000 -0.005360 -0.002431
3 2022-03-01 MO 50.697903 ... 9693000 0.005459 0.002885
4 2022-03-02 GOOG 2695.030029 ... 1198300 0.004349 0.001331
.. ... ... ... ... ... ... ...
83 2022-04-27 MO 54.919998 ... 7946600 0.000729 0.000015
84 2022-04-28 GOOG 2388.229980 ... 1839500 0.038176 0.001172
85 2022-04-28 MO 55.200001 ... 8153900 0.005098 0.000106
86 2022-04-29 GOOG 2299.330078 ... 1683500 -0.037224 -0.001017
87 2022-04-29 MO 55.570000 ... 10861600 0.006703 0.000180
CodePudding user response:
Add this.
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
Before this.
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S&P_500_companies')
I did that, and got this result.
Date Ticker Close ... Stock Splits Volume DailyReturn
0 2020-05-01 AAL 10.640000 ... 0 99441400 NaN
1 2020-05-01 AEE 67.797997 ... 0 1520200 NaN
2 2020-05-01 AEP 75.347603 ... 0 2742100 NaN
3 2020-05-01 AMCR 7.925522 ... 0 4097600 NaN
4 2020-05-01 AMD 49.880001 ... 0 69562700 NaN
... ... ... ... ... ... ...
5035 2022-04-29 AMT 241.020004 ... 0 2151900 -0.044254
5036 2022-04-29 AMZN 2485.629883 ... 0 13616500 -0.140494
5037 2022-04-29 AXP 174.710007 ... 0 3210100 -0.039949
5038 2022-04-29 GOOG 2299.330078 ... 0 1683500 -0.037224
5039 2022-04-29 MO 55.570000 ... 0 10861600 0.006703
[5040 rows x 10 columns]
Then.
stock_data['VWDR'] = (stock_data['DailyReturn'].cumsum()*stock_data['Volume'].cumsum())/stock_data['Volume'].cumsum()
stock_data['VWDR']
Result.
Reference.
https://analyzingalpha.com/vwap
All code.
import pandas as pd
import yfinance as yf
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
# now just read the html to get all the S&P500 tickers
dataload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S&P_500_companies')
df = dataload[0]
# now get the first column(tickers) from the above data
# convert it into a list
ticker_list = df['Symbol'][25:35].values.tolist()
all_tickers = " ".join(ticker_list)
# get all the tickers from yfinance
tickers = yf.Tickers(all_tickers)
# set a start and end date to get two-years info
# group by the ticker
hist = tickers.history(start='2020-05-01', end='2022-05-01', group_by='ticker')
stock_data = pd.DataFrame(hist.stack(level=0).reset_index().rename(columns = {'level_1':'Ticker'}))
stock_data['DailyReturn'] = stock_data.sort_values(['Ticker', 'Date']).groupby('Ticker')['Close'].pct_change()
stock_data['VWDR'] = (stock_data['DailyReturn'].cumsum()*stock_data['Volume'].cumsum())/stock_data['Volume'].cumsum()
stock_data['VWDR']