Home > front end >  drop_duplicates even more for a specific column with latest value?
drop_duplicates even more for a specific column with latest value?

Time:04-15

Is there a way to customize drop_duplicates so that it drops the "kind of" duplicates?

Example: pandas df

Year Name ID City
2011 Superman 101 Metropolis
2011 Batman 102 Gotham
2012 The Batman 102 Gotham
2011 Noobmaster69 103 Online
2011 Noobmaster69 103 Online

I tried using drop_duplicates so I got this

Year Name ID City
2011 Superman 101 Metropolis
2011 Batman 102 Gotham
2012 The Batman 102 Gotham
2011 Noobmaster69 103 Online

I actually want to squeeze it even more, as I want only "102" row with "The Batman" which is newer info (2012>2011) to be on the data frame. Expecting something like this

Year Name ID City
2011 Superman 101 Metropolis
2012 The Batman 102 Gotham
2011 Noobmaster69 103 Online

CodePudding user response:

#Try This Here Duplicates can be easily delete with ID column.

import pandas as pd

#reads your table data
read_file = pd.read_csv("your_filename.csv")

df = pd.DataFrame(read_file)
df = df.drop_duplicates(subset='ID', keep='last')

subset = "specific_col" used to drop the items from the specific column and keep = "last" used to keep the last duplicate(removes first duplicate)

  • Related