So far, I've only seen questions being asked about how to ignore NANs while doing a rolling mean on a groupby. But my case is the opposite. I want to include the NANs such that if even one of the values in the rolling windows is NAN, I want the resulting rolling mean to be NAN as well.
Input:
grouping value_to_avg
0 1 1.0
1 1 2.0
2 1 3.0
3 1 NaN
4 1 4.0
5 2 5.0
6 2 NaN
7 2 6.0
8 2 7.0
9 2 8.0
Code to create sample input:
data = {'grouping': [1,1,1,1,1,2,2,2,2,2], 'value_to_avg': [1,2,3,np.nan,4,5,np.nan,6,7,8]}
db = pd.DataFrame(data)
Code that I have tried:
db['rolling_mean_actual'] = db.groupby('grouping')['value_to_avg'].transform(lambda s: s.rolling(window=3, center=True, min_periods=1).mean(skipna=False))
Actual vs. expected output:
grouping value_to_avg rolling_mean_actual rolling_mean_expected
0 1 1.0 1.5 1.5
1 1 2.0 2.0 2.0
2 1 3.0 2.5 NaN
3 1 NaN 3.5 NaN
4 1 4.0 4.0 NaN
5 2 5.0 5.0 NaN
6 2 NaN 5.5 NaN
7 2 6.0 6.5 NaN
8 2 7.0 7.0 7.0
9 2 8.0 7.5 7.5
You can see above, using skipna=False inside the mean function does not work as expected and still ignores NANs
CodePudding user response:
For me working custom function with np.mean
with convert values to numpy array:
roll_window = 3
db['rolling_mean_actual'] = (db.groupby('grouping')['value_to_avg']
.transform(lambda s: s.rolling(roll_window,
center=True,
min_periods=1)
.apply(lambda x: np.mean(x.to_numpy())))
You can avoid transform
also:
roll_window = 3
db['rolling_mean_actual'] = (db.groupby('grouping')['value_to_avg']
.rolling(roll_window, center=True, min_periods=1)
.apply(lambda x: np.mean(x.to_numpy()))
.droplevel(0))
print (db)
grouping value_to_avg rolling_mean_actual
0 1 1.0 1.5
1 1 2.0 2.0
2 1 3.0 NaN
3 1 NaN NaN
4 1 4.0 NaN
5 2 5.0 NaN
6 2 NaN NaN
7 2 6.0 NaN
8 2 7.0 7.0
9 2 8.0 7.5
CodePudding user response:
pd.concat(
[
df,
df.groupby("grouping")["value_to_avg"].transform(
lambda x: x.expanding().mean()
),
df.groupby("grouping")["value_to_avg"].apply(
lambda x: x.rolling(window=3, center=True, min_periods=0).apply(
lambda x: x.mean() if ~x.isna().any() else np.NaN
)
),
],
axis=1,
)