Home > Mobile >  Rolling mean on a groupby with min_periods=1, but NOT ignoring NANs
Rolling mean on a groupby with min_periods=1, but NOT ignoring NANs

Time:09-06

So far, I've only seen questions being asked about how to ignore NANs while doing a rolling mean on a groupby. But my case is the opposite. I want to include the NANs such that if even one of the values in the rolling windows is NAN, I want the resulting rolling mean to be NAN as well.

Input:

    grouping    value_to_avg
0   1           1.0
1   1           2.0
2   1           3.0
3   1           NaN
4   1           4.0
5   2           5.0
6   2           NaN
7   2           6.0
8   2           7.0
9   2           8.0

Code to create sample input:

data = {'grouping': [1,1,1,1,1,2,2,2,2,2], 'value_to_avg': [1,2,3,np.nan,4,5,np.nan,6,7,8]}  
db = pd.DataFrame(data)

Code that I have tried:

db['rolling_mean_actual'] = db.groupby('grouping')['value_to_avg'].transform(lambda s: s.rolling(window=3, center=True, min_periods=1).mean(skipna=False))

Actual vs. expected output:

  grouping  value_to_avg    rolling_mean_actual rolling_mean_expected
0   1       1.0             1.5                 1.5
1   1       2.0             2.0                 2.0
2   1       3.0             2.5                 NaN
3   1       NaN             3.5                 NaN
4   1       4.0             4.0                 NaN
5   2       5.0             5.0                 NaN
6   2       NaN             5.5                 NaN
7   2       6.0             6.5                 NaN
8   2       7.0             7.0                 7.0
9   2       8.0             7.5                 7.5

You can see above, using skipna=False inside the mean function does not work as expected and still ignores NANs

CodePudding user response:

For me working custom function with np.mean with convert values to numpy array:

roll_window = 3

db['rolling_mean_actual'] = (db.groupby('grouping')['value_to_avg']
                               .transform(lambda s: s.rolling(roll_window, 
                                                              center=True, 
                                                              min_periods=1)
                               .apply(lambda x: np.mean(x.to_numpy())))

You can avoid transform also:

roll_window = 3

db['rolling_mean_actual'] = (db.groupby('grouping')['value_to_avg']
                               .rolling(roll_window, center=True, min_periods=1)
                               .apply(lambda x: np.mean(x.to_numpy()))
                               .droplevel(0))
print (db)

   grouping  value_to_avg  rolling_mean_actual
0         1           1.0                  1.5
1         1           2.0                  2.0
2         1           3.0                  NaN
3         1           NaN                  NaN
4         1           4.0                  NaN
5         2           5.0                  NaN
6         2           NaN                  NaN
7         2           6.0                  NaN
8         2           7.0                  7.0
9         2           8.0                  7.5

CodePudding user response:

pd.concat(
    [
        df,
        df.groupby("grouping")["value_to_avg"].transform(
            lambda x: x.expanding().mean()
        ),
        df.groupby("grouping")["value_to_avg"].apply(
            lambda x: x.rolling(window=3, center=True, min_periods=0).apply(
                lambda x: x.mean() if ~x.isna().any() else np.NaN
            )
        ),
    ],
    axis=1,
)
  • Related