I have a df:
month | A | B | C | D |
---|---|---|---|---|
1994-07 | 1 | 2 | NAN | NAN |
1994-08 | 5 | 2 | 3 | 4 |
1994-09 | 1 | 2 | 1 | 1 |
1994-10 | 1 | 2 | 3 | 1 |
1994-11 | 1 | NAN | 3 | 1 |
1995-07 | 1 | 2 | 2 | 4 |
1995-08 | 1 | 2 | 3 | 4 |
I want, for each column, to get the product of a rolling window of size 5, ignoring NAN values. Which means, in this case:
month | A | B | C | D |
---|---|---|---|---|
1994-11 | 5 | 16 | 27 | 4 |
1995-07 | 5 | 16 | 54 | 16 |
1995-08 | 1 | 16 | 54 | 16 |
For D(1994-11), for example, I would get 4 (4111), and C (1995-07) results in 54 (2331*3). I have tried:
df = df.rolling(window=5,axis=0).apply(prod(min_count=1))
It is an attempt of applying the function product from pandas.
But I get the error "NameError: name 'prod' is not defined"
CodePudding user response:
I would use numpy product here. NAN will be a string when we copy/paste it into our from your sample, so I went ahead and replaced it with np.nan
. From there you can replace NaN
with 1, since it's a product if won't change the outcome, but will fulfill the rolling period requirement.
import pandas as pd
import numpy as np
df = pd.DataFrame({'month': ['1994-07',
'1994-08',
'1994-09',
'1994-10',
'1994-11',
'1995-07',
'1995-08'],
'A': [1, 5, 1, 1, 1, 1, 1],
'B': ['2', '2', '2', '2', 'NAN', '2', '2'],
'C': ['NAN', '3', '1', '3', '3', '2', '3'],
'D': ['NAN', '4', '1', '1', '1', '4', '4']})
(
df.replace('NAN',np.nan)
.set_index('month')
.fillna(1)
.rolling(5)
.apply(np.product)
.dropna()
.reset_index()
)
Output
month A B C D
0 1994-11 5.0 16.0 27.0 4.0
1 1995-07 5.0 16.0 54.0 16.0
2 1995-08 1.0 16.0 54.0 16.0
Another option if you are worried about entirely null columns:
import pandas as pd
import numpy as np
window = 5
df = pd.DataFrame({'month': ['1994-07',
'1994-08',
'1994-09',
'1994-10',
'1994-11',
'1995-07',
'1995-08'],
'A': [1, 5, 1, 1, 1, 1, 1],
'B': ['2', '2', '2', '2', 'NAN', '2', '2'],
'C': ['NAN', '3', '1', '3', '3', '2', '3'],
'D': ['NAN', '4', '1', '1', '1', '4', '4']})
df['C'] = np.nan
(
df.replace('NAN',np.nan)
.set_index('month')
.rolling(window=window, min_periods=1)
.apply(np.nanprod)
.reset_index()
).iloc[window-1:]
Output
month A B C D
4 1994-11 5.0 16.0 NaN 4.0
5 1995-07 5.0 16.0 NaN 16.0
6 1995-08 1.0 16.0 NaN 16.0
CodePudding user response:
In case somebody comes to this: I have solved in the following way: df = df.rolling(window=5, min_periods=1).apply(lambda x: np.prod(1 x) - 1)