We have data representing users penalties count having NaN's, changing in time (the value goes up only). Below is subset of the data:
import pandas as pd
import numpy as np
d = {'day':['Monday','Monday','Monday','Tuesday','Tuesday','Tuesday','Wednesday','Thursday','Thursday','Friday'],
'user_id': [1, 4,2,4,4,2,2,1,2,1], 'penalties_count': [1, 3,2,np.nan,4,2,np.nan,2,3,3]}
df = pd.DataFrame(data=d)
display(df)
day user_id penalties_count
0 Monday 1 1.0
1 Monday 4 3.0
2 Monday 2 2.0
3 Tuesday 4 NaN
4 Tuesday 4 4.0
5 Tuesday 2 2.0
6 Wednesday 2 NaN
7 Thursday 1 2.0
8 Thursday 2 3.0
9 Friday 1 3.0
The goal is to fill NaN cells with previous value, but only for particular user_id. The result should be:
day user_id penalties_count
0 Monday 1 1.0
1 Monday 4 3.0
2 Monday 2 2.0
3 Tuesday 4 3.0
4 Tuesday 4 4.0
5 Tuesday 2 2.0
6 Wednesday 2 2.0
7 Thursday 1 2.0
8 Thursday 2 3.0
9 Friday 1 3.0
But when I use
df.fillna(method='bfill')
The result is incorrect in line 4 for user_id=4 (we should see 3 here, not 4):
day user_id penalties_count
0 Monday 1 1.0
1 Monday 4 3.0
2 Monday 2 2.0
3 Tuesday 4 4.0
4 Tuesday 4 4.0
5 Tuesday 2 2.0
6 Wednesday 2 2.0
7 Thursday 1 2.0
8 Thursday 2 3.0
9 Friday 1 3.0
What can fix the issue?
CodePudding user response:
If you want to fill NA by group, you need to first use groupby
before fill NA. Also it seems that you need ffill
but not bfill
. Like df.groupby("user_id")["penalties_count"].ffill()