I want to fill NAs using the most recent data and have it apply to groups.
This is my dataframe
pet id weight (lbs)
dog 1 30
dog 2 23
dog 3 NaN
cat 4 10
dog 5 NaN
cat 6 NaN
dog 7 39
cat 8 18
hippo 9 138
This is what I want the output to be
pet id weight (lbs)
dog 1 30
dog 2 23
dog 3 23
cat 4 10
dog 5 23
cat 6 10
dog 7 39
cat 8 18
hippo 9 138
This is the code for reproducing the dataframe-
df = pd.DataFrame({'pets':['dog', 'dog', 'dog', 'cat', 'dog', 'cat', 'dog', 'cat', 'hippo'],
'id':[1, 2, 3, 4, 5, 6, 7, 8, 8],
'Weight':[30, 23, np.nan, 10, np.nan, np.nan, 39, 10, 138]})
In other words, I want to fill in NaNs with the most recent non-null value grouped by pet and order by id. This is the code I tried to use: dataframe.sort_values(by = 'id').groupby('pet').fillna(method = 'ffill')
CodePudding user response:
The problem with the code you have is, pandas GroupBy Object doesn't have sort_values method.
So,first group the dataframe by pet
, then apply a function which does forward-fill on each group by sorting the dataframe on id.
The reamaining part of sort_index
, and reset_index
is there just to get the resulting dataframe in the initial order of pet, and id columns.
out = (df.groupby(['pet'])
.apply(lambda x: x.sort_values('id').ffill())
.sort_index(level=1)
.reset_index(0, drop=True)
)
OUTPUT:
>>> out
pet id weight (lbs)
0 dog 1 30.0
1 dog 2 23.0
2 dog 3 23.0
3 cat 4 10.0
4 dog 5 23.0
5 cat 6 10.0
6 dog 7 39.0
7 cat 8 18.0
8 hippo 9 138.0
CodePudding user response:
I think apply not necessary:
df.assign(**df.sort_values('id').groupby('pet').ffill())
pet id weight
0 dog 1 30.0
1 dog 2 23.0
2 dog 3 23.0
3 cat 4 10.0
4 dog 5 23.0
5 cat 6 10.0
6 dog 7 39.0
7 cat 8 18.0
8 hippo 9 138.0