Home > Back-end >  Remove second duplicate row, keep fisrt and fill NaN with value from second
Remove second duplicate row, keep fisrt and fill NaN with value from second

Time:05-21

I have some duplicate rows in my dataframe, the first occurrence has Nan value in specific column and the second occurrence has that value in the sma ecolumn. I want to remove duplicate, keep the fisrt occurrence and replace Nan value with value from second occurrence. Like this:

Name Rank City
Andre Ryan NaN London
Andre Ryan 86 Paris
... ...- ...

The goal

Name Rank City
Andre Ryan 86 London
... ... ...

CodePudding user response:

Here's one way to dedup by using groupby and agg on this example dataset:

enter image description here

import pandas as pd

#create the test table
df = pd.DataFrame({
    'Name':['A','A','B','C','B','C'],
    'Rank':[None,1,None,None,2,3],
    'City':['q','r','s','t','u','v'],
})

#groupby and agg to dedup
dedup_df = df.groupby('Name').agg(
    Rank = ('Rank',lambda ranks: ranks.dropna().iloc[0]),
    City = ('City','first'),
).reset_index()

dedup_df

Output

enter image description here

CodePudding user response:

You can try groupby.bfill then drop_duplicates

out = (df.assign(**df.groupby('Name').bfill())
       .drop_duplicates('Name', keep='first'))
print(out)

         Name  Rank    City
0  Andre Ryan  86.0  London
  • Related