If I have a dataframe
Index City Country State
0 Chicago US IL
1 Sacramento US CA
2 Sacramento US
3 Naperville US IL
I want to find rows with duplicate values for 'City' and 'Country' but only drop the row with no entry for 'State. Ie. drop row#2
What is the best way to approach this using Pandas?
CodePudding user response:
Use a boolean mask to get the index of rows to delete then use drop
to remove this rows with inplace=True
as argument:
df.drop(df.loc[(df.duplicated(['City','Country'])
& df['State'].notna())].index, inplace=True)
print(df)
# Output:
City Country State
0 Chicago US IL
1 Sacramento US CA
3 Naperville US IL
Note: the answer of @QuangHoang is the opposite of this one. I drop the bad rows, he keeps the right rows. Honestly, I prefer his method.