Home > Net >  Compute product with rolling window
Compute product with rolling window

Time:04-17

I have a df:

month A B C D
1994-07 1 2 NAN NAN
1994-08 5 2 3 4
1994-09 1 2 1 1
1994-10 1 2 3 1
1994-11 1 NAN 3 1
1995-07 1 2 2 4
1995-08 1 2 3 4

I want, for each column, to get the product of a rolling window of size 5, ignoring NAN values. Which means, in this case:

month A B C D
1994-11 5 16 27 4
1995-07 5 16 54 16
1995-08 1 16 54 16

For D(1994-11), for example, I would get 4 (4111), and C (1995-07) results in 54 (2331*3). I have tried:

df = df.rolling(window=5,axis=0).apply(prod(min_count=1))

It is an attempt of applying the function product from pandas.

But I get the error "NameError: name 'prod' is not defined"

CodePudding user response:

I would use numpy product here. NAN will be a string when we copy/paste it into our from your sample, so I went ahead and replaced it with np.nan. From there you can replace NaN with 1, since it's a product if won't change the outcome, but will fulfill the rolling period requirement.

import pandas as pd
import numpy as np
df = pd.DataFrame({'month': ['1994-07',
  '1994-08',
  '1994-09',
  '1994-10',
  '1994-11',
  '1995-07',
  '1995-08'],
 'A': [1, 5, 1, 1, 1, 1, 1],
 'B': ['2', '2', '2', '2', 'NAN', '2', '2'],
 'C': ['NAN', '3', '1', '3', '3', '2', '3'],
 'D': ['NAN', '4', '1', '1', '1', '4', '4']})


(
    df.replace('NAN',np.nan)
      .set_index('month')
      .fillna(1)
      .rolling(5)
      .apply(np.product)
      .dropna()
      .reset_index()
)

Output

     month    A     B     C     D
0  1994-11  5.0  16.0  27.0   4.0
1  1995-07  5.0  16.0  54.0  16.0
2  1995-08  1.0  16.0  54.0  16.0

Another option if you are worried about entirely null columns:

import pandas as pd
import numpy as np

window = 5

df = pd.DataFrame({'month': ['1994-07',
  '1994-08',
  '1994-09',
  '1994-10',
  '1994-11',
  '1995-07',
  '1995-08'],
 'A': [1, 5, 1, 1, 1, 1, 1],
 'B': ['2', '2', '2', '2', 'NAN', '2', '2'],
 'C': ['NAN', '3', '1', '3', '3', '2', '3'],
 'D': ['NAN', '4', '1', '1', '1', '4', '4']})

df['C'] = np.nan
(
    df.replace('NAN',np.nan)
      .set_index('month')
      .rolling(window=window, min_periods=1)
      .apply(np.nanprod)
      .reset_index()
    
).iloc[window-1:]

Output

    month    A     B   C     D
4  1994-11  5.0  16.0 NaN   4.0
5  1995-07  5.0  16.0 NaN  16.0
6  1995-08  1.0  16.0 NaN  16.0

CodePudding user response:

In case somebody comes to this: I have solved in the following way: df = df.rolling(window=5, min_periods=1).apply(lambda x: np.prod(1 x) - 1)

  • Related