Home > Software engineering >  Front-fill Null values in a dataframe separated by group
Front-fill Null values in a dataframe separated by group

Time:03-01

I want to fill NAs using the most recent data and have it apply to groups.

This is my dataframe

pet    id     weight (lbs)
dog    1      30
dog    2      23
dog    3      NaN
cat    4      10
dog    5      NaN
cat    6      NaN
dog    7      39
cat    8      18
hippo  9      138

This is what I want the output to be

pet    id     weight (lbs)
dog    1      30
dog    2      23
dog    3      23
cat    4      10
dog    5      23
cat    6      10
dog    7      39
cat    8      18
hippo  9      138

This is the code for reproducing the dataframe-

df = pd.DataFrame({'pets':['dog', 'dog', 'dog', 'cat', 'dog', 'cat', 'dog', 'cat', 'hippo'],
                   'id':[1, 2, 3, 4, 5, 6, 7, 8, 8],
                   'Weight':[30, 23, np.nan, 10, np.nan, np.nan, 39, 10, 138]})

In other words, I want to fill in NaNs with the most recent non-null value grouped by pet and order by id. This is the code I tried to use: dataframe.sort_values(by = 'id').groupby('pet').fillna(method = 'ffill')

CodePudding user response:

The problem with the code you have is, pandas GroupBy Object doesn't have sort_values method.

So,first group the dataframe by pet, then apply a function which does forward-fill on each group by sorting the dataframe on id.

The reamaining part of sort_index, and reset_index is there just to get the resulting dataframe in the initial order of pet, and id columns.

out = (df.groupby(['pet'])
        .apply(lambda x: x.sort_values('id').ffill())
        .sort_index(level=1)
        .reset_index(0, drop=True)
         )

OUTPUT:

>>> out
     pet  id  weight (lbs)
0    dog   1          30.0
1    dog   2          23.0
2    dog   3          23.0
3    cat   4          10.0
4    dog   5          23.0
5    cat   6          10.0
6    dog   7          39.0
7    cat   8          18.0
8  hippo   9         138.0

CodePudding user response:

I think apply not necessary:

df.assign(**df.sort_values('id').groupby('pet').ffill())

     pet  id  weight
0    dog   1    30.0
1    dog   2    23.0
2    dog   3    23.0
3    cat   4    10.0
4    dog   5    23.0
5    cat   6    10.0
6    dog   7    39.0
7    cat   8    18.0
8  hippo   9   138.0
  • Related