What I am trying to achieve here is to compute the NetChgAvg for each row, based on a previous row value. Here is the formula:
NetChgAvg = NetChgAvg[-1] SF * ( Returns - NetChgAvg[-1] )
and here is a part of the data set:
(The index starts at 15 and the value 15 is computed using the previous n=length entries. I removed those indexes for convenience)
df.head(5)
Out[11]:
Date Close NetChgAvg TotChgAvg RSI Returns
15 09/22/2021 342.50 -0.789286 1.953571 29.798903 3.39
16 09/23/2021 347.56 NaN NaN NaN 5.06
17 09/24/2021 347.87 NaN NaN NaN 0.31
18 09/27/2021 348.61 NaN NaN NaN 0.74
19 09/28/2021 342.92 NaN NaN NaN -5.69
Currently I'm computing this with the following for loop:
for i in range((length 2), df.index[-1] 1):
df.loc[i, 'NetChgAvg'] = df.loc[i-1, 'NetChgAvg'] smoothing_factor * (df.loc[i, 'Returns'] - df.loc[i-1, 'NetChgAvg'])
df
Which results in the following data frame:
df.head(5)
Out[11]:
Date Close NetChgAvg TotChgAvg RSI Returns
15 09/22/2021 342.50 -0.789286 1.953571 29.798903 3.39
16 09/23/2021 347.56 -0.371480 NaN NaN 5.06
17 09/24/2021 347.87 -0.322802 NaN NaN 0.31
18 09/27/2021 348.61 -0.246888 NaN NaN 0.74
19 09/28/2021 342.92 -0.635682 NaN NaN -5.69
The values are correct, however this computation is slow. Considering the size of data sets I intend to run through this, I can expect computation time to lead into hours. Is there any way I can vectorize this, since each NetChgAvg is based on the previous value? If not, what alternatives do I have to speed up this for loop (maybe Numba)?
CodePudding user response:
To reference the previous row in a vectorized operation, you can use df.shift
. The formula you described initially could be written as:
NetChgAvg_shifted = df['NetChgAvg'].shift(-1)
df['NetChgAvg'] = NetChgAvg_shifted smoothing_factor * (df['Returns'] - NetChgAvg_shifted)
CodePudding user response:
I found that, given the incremental nature of the calculation, it looks like I can't vectorize. I had luck speeding up the loop by replacing .loc lookups with .at after seeing that it's preferred for scalar lookups. Here's the speed improvements for anyone interested:
%timeit netChgAvg() 19.7 ms ± 41.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit at_netChgAvg() 1.82 ms ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)