Hi have a dataset with 2 rows :
data :
user_id,city
123,delhi
456,Pune
123,Goa
789,Hyderabad
when i check the duplicated user_id, i get only 1 row: Code :
df2[df2["user_id"].duplicated()]["user_id"]
Output :
1 123
Name: user_id, dtype: int64
When i try to check the duplicate based on user_id :
df2[df2["user_id"].duplicated()]
I get only 1 record in output :
1 123 Delhi
There is no junk character or space in user_id column
How to find all duplicated user_id and delete one of them ?
I tried to delete from row index position but didn't helped.
CodePudding user response:
please try using
df2.drop_duplicates(subset=['user_id'], keep='first')
You can use keep='first'
or keep='last'
CodePudding user response:
In your case
nodup = df2[~df2["user_id"].duplicated()]