Home > Software design >  Remove duplicate rows based on columns in pandas dataframe
Remove duplicate rows based on columns in pandas dataframe

Time:03-10

Hi have a dataset with 2 rows :

data :

user_id,city
123,delhi
456,Pune
123,Goa
789,Hyderabad

when i check the duplicated user_id, i get only 1 row: Code :

df2[df2["user_id"].duplicated()]["user_id"]

Output :

1 123
Name: user_id, dtype: int64

When i try to check the duplicate based on user_id :

df2[df2["user_id"].duplicated()]

I get only 1 record in output :

1 123 Delhi

There is no junk character or space in user_id column

How to find all duplicated user_id and delete one of them ?

I tried to delete from row index position but didn't helped.

CodePudding user response:

please try using

df2.drop_duplicates(subset=['user_id'], keep='first')

You can use keep='first' or keep='last'

CodePudding user response:

In your case

nodup = df2[~df2["user_id"].duplicated()]
  • Related