I have the following example dataframe:
df = pd.DataFrame({'planet':['Earth','Earth','Mars',np.nan,'Venus','Mars',np.nan,'Venus','Earth',np.nan],
'group':['alpha','alpha','beta','gamma','delta','beta','delta','delta','gamma','beta']})
Every row is an individual's planet and group. It is given that the planet of a particular group is same for all individuals. So if alpha group is from Earth, all individuals in alpha group are from Earth. I want to use this information to fill the missing values in planet column.
What I did was create a dataframe with rows that are missing planet values:
df2 = df[pd.isna(df['planet'])]
and used the groups in df2 to slice the original dataframe, adding the condition that planet values are not null:
df3 = df[(~pd.isna(df['planet'])) & (df['group'].isin(df2['group']))]
df3 = df3.drop_duplicates()
Now how do I fill in the missing values in the original dataframe (df)? Is there a better method?
CodePudding user response:
if alpha group is from Earth, all individuals in alpha group are from Earth
This suggests that one way that you could use is to groupby.transform
the first
to fill in the missing values:
df['planet'] = df.groupby('group')['planet'].transform('first')
Output:
planet group
0 Earth alpha
1 Earth alpha
2 Mars beta
3 Earth gamma
4 Venus delta
5 Mars beta
6 Venus delta
7 Venus delta
8 Earth gamma
9 Mars beta