Home > Net >  Cumulative product for each element over x days in pandas DataFrame
Cumulative product for each element over x days in pandas DataFrame

Time:05-23

I try to calculate for each element in a Dataframe df1 the cumulative product over x days (for example 3). Is there an efficient way to do that? So far, I only know to generate a normal cumulative product over the rows, without fixing the lookback period to x days (=df2). For example:

  • 2022-01-02 ID1: 1.0528 = (ignoring NaN) 0.94 * 1.12
  • 2022-01-05 ID1: 1.2002 = 0.94 * 1.12 * 1.14
  • 2022-01-09 ID1: 1.4045 = 1.12 * 1.14 * 1.10

Optimally, it would also ignore NaN values and calculate the cumulative return over the remaining numbers. So far I only know how to compute that by creating a copy and shifting it, but I would like to do calculate the cumulative product for many columns over 50 rows, which would not be efficient.

df1:
            ID1     ID2
Date        
2022-01-02  NaN     0.95
2022-01-05  0.94    0.98
2022-01-09  1.12    NaN
2022-01-10  1.14    1.02
2022-01-11  1.10    1.00
2022-01-12  0.92    0.82

df2:
            ID1     ID2
Date        
2022-01-02  1.0528  0.9309
2022-01-05  1.2002  0.9996
2022-01-09  1.4045  1.0200
2022-01-10  1.1537  0.8364
2022-01-11  1.0120  0.8200
2022-01-12  0.9200  0.8200

For reproducability:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({
    'Date':['2022-01-02', '2022-01-05', '2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12'],
    'ID1':[np.nan, 0.94, 1.12, 1.14, 1.1, 0.92], 
    'ID2':[0.95, 0.98, np.nan, 1.02, 1, 0.82]})
df1 = df1.set_index('Date')

Thanks a lot for you suggestion!

CodePudding user response:

You can use a custom rolling indexer for forward-looking windows and np.nanprod:

import numpy as np

# Window of index i includes rows [i:i 3).
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)

print(df.rolling(indexer, min_periods=1).apply(np.nanprod))
                 ID1     ID2
Date                        
2022-01-02  1.052800  0.9310
2022-01-05  1.200192  0.9996
2022-01-09  1.404480  1.0200
2022-01-10  1.153680  0.8364
2022-01-11  1.012000  0.8200
2022-01-12  0.920000  0.8200

CodePudding user response:

You can use:

df1.fillna(1)[::-1].rolling(window=3, min_periods=1).agg(lambda x: x.prod())[::-1]

output:

                 ID1     ID2
Date                        
2022-01-02  1.052800  0.9310
2022-01-05  1.200192  0.9996
2022-01-09  1.404480  1.0200
2022-01-10  1.153680  0.8364
2022-01-11  1.012000  0.8200
2022-01-12  0.920000  0.8200

How does it work?

  • rolling uses the previous rows (or is centered), here we reverse the array to compute an inverse rolling
  • We fill the NaNs with 1 (we could also use np.nanprod)
  • We use min_periods=1 to enable computation on less than 3 elements
  • Related