I want to fill missing "Age" values of a DataFrame by a mean of a two-column subgroup.
df.groupby(["col_x","col_y"])["Age"].mean()
The code above returns the means of these sub-groups:
col_X col_Y
X 1 35
2 29
3 22
Y 1 41
2 31
3 27
I have a feeling this can be achieved by using the .map function:
df.loc[df['Age'].isnull(),'Age'] = df[['col_X',"col_Y"]].map(something)
Can anybody help me with this?
CodePudding user response:
It's better with groupby().transform
, which returns a series with same index as df
. So you can fillna
with it:
df['Age'] = df['Age'].fillna(df.groupby(['col_x','col_y'])['Age'].transform('mean'))