Let's say I have a dataframe like this:
name address email paid
0 John 123 State St [email protected] 5.00
1 NaN NaN NaN 15.00
2 John NaN NaN 3.00
3 NaN 100 Owen Ave NaN 10.00
I would like to forward fill both the address and email columns when name is the same, and if the name is NaN, leave the row untouched, after doing so, the dataframe should look like this:
name address email paid
0 John 123 State St [email protected] 5.00
1 NaN NaN NaN 15.00
2 John 123 State St [email protected] 3.00
3 NaN 100 Owen Ave NaN 10.00
I've tried using
df[['address', 'email']] = df[df['name'].notna()].groupby('name', sort=False)[['address', 'email']].ffill()
but it changes the 3rd row address to NaN like this
name address email paid
0 John 123 State St [email protected] 5.00
1 NaN NaN NaN 15.00
2 John 123 State St [email protected] 3.00
3 NaN NaN NaN 10.00
Is there anyway to let pandas leave some rows as they are?
CodePudding user response:
In the last assignment you are not filtering for only not NaN name rows. You are assigning, instead, the group-by results for all the rows.
This should fix the problem:
df.loc[df['name'].notna(),['address', 'email']] = df[df['name'].notna()].groupby('name', sort=False)[['address', 'email']].ffill()
In this way, you are assigning the results only for the rows with Name different from NaN