Home > Blockchain >  Pandas rolling mean only for non-NaNs
Pandas rolling mean only for non-NaNs

Time:11-18

If have a DataFrame:

df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]
                   'A1': [1, 1, 2, 2, 2]
                   'A2': [1, 2, 3, 3, 3]})

I want to create a grouped-by on columns "A1" and "A2" and then apply a rolling-mean on "B" with window 3. If less values are available, that is fine, the mean should still be computed. But I do not want any values if there is no original entry.

Result should be:

pd.DataFrame({'B': [0, 1, 2, np.nan, 3]})

Applying df.rolling(3, min_periods=1).mean() yields:

pd.DataFrame({'B': [0, 1, 2, 2, 3]})

Any ideas?

CodePudding user response:

Reason is for mean with widows=3 is ouput some scalars, not NaNs, possible solution is set NaN manually after rolling:

df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4],
                   'A': [1, 1, 2, 2, 2]})

df['C'] = df['B'].rolling(3, min_periods=1).mean().mask(df['B'].isna())

df['D'] = df.groupby('A')['B'].rolling(3, min_periods=1).mean().droplevel(0).mask(df['B'].isna())
print (df)
     B  A    C    D
0  0.0  1  0.0  0.0
1  1.0  1  0.5  0.5
2  2.0  2  1.0  2.0
3  NaN  2  NaN  NaN
4  4.0  2  3.0  3.0

EDIT: For multiple grouping columns remove levels in Series.droplevel:

df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4],
                   'A1': [1, 1, 2, 2, 2],
                   'A2': [1, 2, 3, 3, 3]})

df['D'] = df.groupby(['A1','A2'])['B'].rolling(3, min_periods=1).mean().droplevel(['A1','A2']).mask(df['B'].isna())
print (df)

     B  A1  A2    D
0  0.0   1   1  0.0
1  1.0   1   2  1.0
2  2.0   2   3  2.0
3  NaN   2   3  NaN
4  4.0   2   3  3.0
  • Related