test_df = pd.DataFrame({'a':[np.nan,np.nan,np.nan,4,np.nan,np.nan,6]})
test_df
a
0 NaN
1 NaN
2 NaN
3 4.0
4 NaN
5 NaN
6 6.0
I'm trying to backfill with the real value divided by the number of na values itself. The following is what I'm trying to get
a
0 1.0
1 1.0
2 1.0
3 1.0
4 2.0
5 2.0
6 2.0
CodePudding user response:
Try:
# identify the blocks by cumsum on the reversed non-nan series
groups = test_df['a'].notna()[::-1].cumsum()
# groupby and transform
test_df['a'] = test_df['a'].fillna(0).groupby(groups).transform('mean')
Output:
a
0 1.0
1 1.0
2 1.0
3 1.0
4 2.0
5 2.0
6 2.0
CodePudding user response:
IIUC use:
# get reverse group
group = test_df.loc[::-1,'a'].notna().cumsum()
# get size and divide
test_df['a'] = (test_df['a']
.bfill()
.div(test_df.groupby(group)['a'].transform('size'))
)
Or with rdiv
:
test_df['a'] = (test_df
.groupby(group)['a']
.transform('size')
.rdiv(test_df['a'].bfill())
)
Output (as new column for clarity):
a a2
0 NaN 1.0
1 NaN 1.0
2 NaN 1.0
3 4.0 1.0
4 NaN 2.0
5 NaN 2.0
6 6.0 2.0