Home > Net >  Drop duplicates based on 2 columns if the value in another column is null - Pandas
Drop duplicates based on 2 columns if the value in another column is null - Pandas

Time:12-01

If I have a dataframe

Index   City     Country    State
 0       Chicago    US        IL
 1       Sacramento US        CA
 2       Sacramento US 
 3       Naperville US        IL

I want to find rows with duplicate values for 'City' and 'Country' but only drop the row with no entry for 'State. Ie. drop row#2

What is the best way to approach this using Pandas?

CodePudding user response:

Use a boolean mask to get the index of rows to delete then use drop to remove this rows with inplace=True as argument:

df.drop(df.loc[(df.duplicated(['City','Country'])
               & df['State'].notna())].index, inplace=True)
print(df)

# Output:
         City Country State
0     Chicago      US    IL
1  Sacramento      US    CA
3  Naperville      US    IL

Note: the answer of @QuangHoang is the opposite of this one. I drop the bad rows, he keeps the right rows. Honestly, I prefer his method.

  • Related