Home > other >  Pandas dataframe sum of row won't let me use result in equation
Pandas dataframe sum of row won't let me use result in equation

Time:09-29

Anybody wish to help me understand why below code doesn't work?

start_date = '1990-01-01'
ticker_list = ['SPY', 'QQQ', 'IWM','GLD']
tickers = yf.download(ticker_list, start=start_date)['Close'].dropna()


ticker_vol_share = (tickers.pct_change().rolling(20).std()) \
                    / ((tickers.pct_change().rolling(20).std()).sum(axis=1))

Both (tickers.pct_change().rolling(20).std()) and ((tickers.pct_change().rolling(20).std()).sum(axis=1)) runs fine by themselves, but when ran together they form a dataframe with thousands of columns all filled with nan

CodePudding user response:

Try this.

rolling_std = tickers.pct_change().rolling(20).std()
ticker_vol_share = rolling_std.apply(lambda row:row/sum(row),axis = 1)

You will get enter image description here

CodePudding user response:

Why its not working as expected:

  • Your tickers object is a DataFrame, as is the tickers.pct_change(), tickers.pct_change().rolling(20) and tickers.pct_change().rolling(20).std(). The tickers.pct_change().rolling(20).std().sum(axis=1) is probably a Series.
  • You're therefore doing element-wise division of a DataFrame by a Series. This yields a DataFrame.
  • Without seeing your source data, it's hard to say for sure why the output DF is filled with nan, but that can certainly happen if some of the things you're dividing by are 0. It might also happen if each series is only one element long after taking the rolling average. It might also happen if you're actually evaluating a Series tickers rather than a DataFrame, since Series.sum(axis=1) doesn't make a whole lot of sense. It is also suspicious that your top and bottom portions of the division are probably different shapes, since sum() collapses an axis.

It's not clear to me what your expected output is, so I'll defer to others or wait for an update before answering that part.

  • Related