Fill null values using information from another column-CodePudding

I have the following example dataframe:

df = pd.DataFrame({'planet':['Earth','Earth','Mars',np.nan,'Venus','Mars',np.nan,'Venus','Earth',np.nan],
               'group':['alpha','alpha','beta','gamma','delta','beta','delta','delta','gamma','beta']})

Every row is an individual's planet and group. It is given that the planet of a particular group is same for all individuals. So if alpha group is from Earth, all individuals in alpha group are from Earth. I want to use this information to fill the missing values in planet column.

What I did was create a dataframe with rows that are missing planet values:

df2 = df[pd.isna(df['planet'])]

and used the groups in df2 to slice the original dataframe, adding the condition that planet values are not null:

df3 = df[(~pd.isna(df['planet'])) & (df['group'].isin(df2['group']))]
df3 = df3.drop_duplicates()

Now how do I fill in the missing values in the original dataframe (df)? Is there a better method?

CodePudding user response：

if alpha group is from Earth, all individuals in alpha group are from Earth

This suggests that one way that you could use is to groupby.transform the first to fill in the missing values:

df['planet'] = df.groupby('group')['planet'].transform('first')

Output:

  planet  group
0  Earth  alpha
1  Earth  alpha
2   Mars   beta
3  Earth  gamma
4  Venus  delta
5   Mars   beta
6  Venus  delta
7  Venus  delta
8  Earth  gamma
9   Mars   beta