Home > Enterprise >  Backfill column values using real value divided by number of preceding NA values in Pandas
Backfill column values using real value divided by number of preceding NA values in Pandas

Time:09-15

test_df = pd.DataFrame({'a':[np.nan,np.nan,np.nan,4,np.nan,np.nan,6]})
test_df
    a
0   NaN
1   NaN
2   NaN
3   4.0
4   NaN
5   NaN
6   6.0

I'm trying to backfill with the real value divided by the number of na values itself. The following is what I'm trying to get

    a
0   1.0
1   1.0
2   1.0
3   1.0
4   2.0
5   2.0
6   2.0

CodePudding user response:

Try:

# identify the blocks by cumsum on the reversed non-nan series
groups = test_df['a'].notna()[::-1].cumsum()

# groupby and transform 
test_df['a'] = test_df['a'].fillna(0).groupby(groups).transform('mean')

Output:

     a
0  1.0
1  1.0
2  1.0
3  1.0
4  2.0
5  2.0
6  2.0

CodePudding user response:

IIUC use:

# get reverse group
group = test_df.loc[::-1,'a'].notna().cumsum()

# get size and divide
test_df['a'] = (test_df['a']
                .bfill()
                .div(test_df.groupby(group)['a'].transform('size'))
               )

Or with rdiv:

test_df['a'] = (test_df
                .groupby(group)['a']
                .transform('size')
                .rdiv(test_df['a'].bfill())
                 )

Output (as new column for clarity):

     a   a2
0  NaN  1.0
1  NaN  1.0
2  NaN  1.0
3  4.0  1.0
4  NaN  2.0
5  NaN  2.0
6  6.0  2.0
  • Related