Home > Net >  Filling missing values based on multi-column subgroup
Filling missing values based on multi-column subgroup

Time:12-01

I want to fill missing "Age" values of a DataFrame by a mean of a two-column subgroup.

df.groupby(["col_x","col_y"])["Age"].mean()

The code above returns the means of these sub-groups:

col_X   col_Y
X       1         35
        2         29
        3         22
Y       1         41
        2         31
        3         27

I have a feeling this can be achieved by using the .map function:

df.loc[df['Age'].isnull(),'Age'] = df[['col_X',"col_Y"]].map(something)

Can anybody help me with this?

CodePudding user response:

It's better with groupby().transform, which returns a series with same index as df. So you can fillna with it:

df['Age'] = df['Age'].fillna(df.groupby(['col_x','col_y'])['Age'].transform('mean'))
  • Related