Remove second duplicate row, keep fisrt and fill NaN with value from second-CodePudding

I have some duplicate rows in my dataframe, the first occurrence has Nan value in specific column and the second occurrence has that value in the sma ecolumn. I want to remove duplicate, keep the fisrt occurrence and replace Nan value with value from second occurrence. Like this:

Name	Rank	City
Andre Ryan	NaN	London
Andre Ryan	86	Paris
...	...-	...

The goal

Name	Rank	City
Andre Ryan	86	London
...	...	...

CodePudding user response：

Here's one way to dedup by using groupby and agg on this example dataset:

import pandas as pd

#create the test table
df = pd.DataFrame({
    'Name':['A','A','B','C','B','C'],
    'Rank':[None,1,None,None,2,3],
    'City':['q','r','s','t','u','v'],
})

#groupby and agg to dedup
dedup_df = df.groupby('Name').agg(
    Rank = ('Rank',lambda ranks: ranks.dropna().iloc[0]),
    City = ('City','first'),
).reset_index()

dedup_df

Output

CodePudding user response：

You can try groupby.bfill then drop_duplicates

out = (df.assign(**df.groupby('Name').bfill())
       .drop_duplicates('Name', keep='first'))

print(out)

         Name  Rank    City
0  Andre Ryan  86.0  London