have a pandas dataframme with columns name , school and marks
name school marks
tom HBS 55
tom HBS 54
tom HBS 12
mark HBS 28
mark HBS 19
lewis HBS 88
How to drop last duplicate row and keep reamining data
name school marks
tom HBS 55
tom HBS 54
mark HBS 28
lewis HBS 88
tried this:
df.drop_duplicates(['name','school'],keep=last)
print(df)
CodePudding user response:
If you want to drop only the last duplicate, you need to use two masks:
m1 = df.duplicated(['name','school'], keep="last") # is it the last row per group?
m2 = ~df.duplicated(['name','school'], keep=False) # is it not duplicated?
new_df = df[m1|m2]
output:
name school marks
0 tom HBS 55
1 tom HBS 54
3 mark HBS 28
5 lewis HBS 88
CodePudding user response:
I extrapolated @DSM's answer(from here) taking into account that you want rows with no duplicates:
df.groupby("name", as_index=False).apply(lambda x: x if len(x)==1 else x.iloc[:-1]).reset_index()