I have a dataframe with duplicates:
timestamp id ch is_eval. c
12. 1. 1. False. 2
13. 1. 0. False. 1
12. 1. 1. True. 4
13. 1 0. False. 3
When there are duplicated, it is always when
I want to drop_duplicates
with the key (timestamp,id,ch)
but keep the row where is_eval
is True.
Meaning, if there is a row with is_eval==True
then keep it. Otherwise, it doesnt matter.
So the output here should be:
12. 1. 1. True. 4
13. 1 0. False. 1
How can I do it?
CodePudding user response:
Use:
df = df.sort_values('is_eval', kind='mergesort', ascending=False).drop_duplicates(['timestamp','id','ch'])
print (df)
timestamp id ch is_eval c
2 12 1 1 True 4
1 13 1 0 False 1