Home > database >  Using pandas fillna with bfill method for particular cells
Using pandas fillna with bfill method for particular cells

Time:06-06

We have data representing users penalties count having NaN's, changing in time (the value goes up only). Below is subset of the data:

import pandas as pd
import numpy as np
d = {'day':['Monday','Monday','Monday','Tuesday','Tuesday','Tuesday','Wednesday','Thursday','Thursday','Friday'],
     'user_id': [1, 4,2,4,4,2,2,1,2,1], 'penalties_count': [1, 3,2,np.nan,4,2,np.nan,2,3,3]}
df = pd.DataFrame(data=d)
display(df)


      day   user_id     penalties_count
0   Monday      1       1.0
1   Monday      4       3.0
2   Monday      2       2.0
3   Tuesday     4       NaN
4   Tuesday     4       4.0
5   Tuesday     2       2.0
6   Wednesday   2       NaN
7   Thursday    1       2.0
8   Thursday    2       3.0
9   Friday      1       3.0

The goal is to fill NaN cells with previous value, but only for particular user_id. The result should be:

     day     user_id  penalties_count
0   Monday      1       1.0
1   Monday      4       3.0
2   Monday      2       2.0
3   Tuesday     4       3.0
4   Tuesday     4       4.0
5   Tuesday     2       2.0
6   Wednesday   2       2.0
7   Thursday    1       2.0
8   Thursday    2       3.0
9   Friday      1       3.0

But when I use

df.fillna(method='bfill') The result is incorrect in line 4 for user_id=4 (we should see 3 here, not 4):

     day     user_id  penalties_count
0   Monday      1       1.0
1   Monday      4       3.0
2   Monday      2       2.0
3   Tuesday     4       4.0
4   Tuesday     4       4.0
5   Tuesday     2       2.0
6   Wednesday   2       2.0
7   Thursday    1       2.0
8   Thursday    2       3.0
9   Friday      1       3.0

What can fix the issue?

CodePudding user response:

If you want to fill NA by group, you need to first use groupby before fill NA. Also it seems that you need ffill but not bfill. Like df.groupby("user_id")["penalties_count"].ffill()

  • Related