I try to calculate for each element in a Dataframe
df1
the cumulative product over x
days (for example 3). Is there an efficient way to do that? So far, I only know to generate a normal cumulative product over the rows, without fixing the lookback period to x
days (=df2
). For example:
- 2022-01-02 ID1: 1.0528 = (ignoring NaN) 0.94 * 1.12
- 2022-01-05 ID1: 1.2002 = 0.94 * 1.12 * 1.14
- 2022-01-09 ID1: 1.4045 = 1.12 * 1.14 * 1.10
Optimally, it would also ignore NaN values and calculate the cumulative return over the remaining numbers. So far I only know how to compute that by creating a copy and shifting it, but I would like to do calculate the cumulative product for many columns over 50 rows, which would not be efficient.
df1:
ID1 ID2
Date
2022-01-02 NaN 0.95
2022-01-05 0.94 0.98
2022-01-09 1.12 NaN
2022-01-10 1.14 1.02
2022-01-11 1.10 1.00
2022-01-12 0.92 0.82
df2:
ID1 ID2
Date
2022-01-02 1.0528 0.9309
2022-01-05 1.2002 0.9996
2022-01-09 1.4045 1.0200
2022-01-10 1.1537 0.8364
2022-01-11 1.0120 0.8200
2022-01-12 0.9200 0.8200
For reproducability:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'Date':['2022-01-02', '2022-01-05', '2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12'],
'ID1':[np.nan, 0.94, 1.12, 1.14, 1.1, 0.92],
'ID2':[0.95, 0.98, np.nan, 1.02, 1, 0.82]})
df1 = df1.set_index('Date')
Thanks a lot for you suggestion!
CodePudding user response:
You can use a custom rolling indexer for forward-looking windows and np.nanprod
:
import numpy as np
# Window of index i includes rows [i:i 3).
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)
print(df.rolling(indexer, min_periods=1).apply(np.nanprod))
ID1 ID2
Date
2022-01-02 1.052800 0.9310
2022-01-05 1.200192 0.9996
2022-01-09 1.404480 1.0200
2022-01-10 1.153680 0.8364
2022-01-11 1.012000 0.8200
2022-01-12 0.920000 0.8200
CodePudding user response:
You can use:
df1.fillna(1)[::-1].rolling(window=3, min_periods=1).agg(lambda x: x.prod())[::-1]
output:
ID1 ID2
Date
2022-01-02 1.052800 0.9310
2022-01-05 1.200192 0.9996
2022-01-09 1.404480 1.0200
2022-01-10 1.153680 0.8364
2022-01-11 1.012000 0.8200
2022-01-12 0.920000 0.8200
How does it work?
- rolling uses the previous rows (or is centered), here we reverse the array to compute an inverse rolling
- We fill the NaNs with 1 (we could also use np.nanprod)
- We use min_periods=1 to enable computation on less than 3 elements