Home > Software engineering >  Pandas drop last duplicate record and keep remaining
Pandas drop last duplicate record and keep remaining

Time:12-04

have a pandas dataframme with columns name , school and marks

name  school  marks

tom     HBS     55
tom     HBS     54
tom     HBS     12
mark    HBS     28
mark    HBS     19
lewis   HBS     88

How to drop last duplicate row and keep reamining data

name  school  marks

tom     HBS     55
tom     HBS     54
mark    HBS     28
lewis   HBS     88

tried this:

df.drop_duplicates(['name','school'],keep=last)


print(df)

CodePudding user response:

If you want to drop only the last duplicate, you need to use two masks:

m1 = df.duplicated(['name','school'], keep="last") # is it the last row per group?
m2 = ~df.duplicated(['name','school'], keep=False) # is it not duplicated?
new_df = df[m1|m2]

output:

    name school  marks
0    tom    HBS     55
1    tom    HBS     54
3   mark    HBS     28
5  lewis    HBS     88

CodePudding user response:

I extrapolated @DSM's answer(from here) taking into account that you want rows with no duplicates:

df.groupby("name", as_index=False).apply(lambda x: x if len(x)==1 else x.iloc[:-1]).reset_index()
  • Related