A | B | C |
---|---|---|
aaa | 01-03-2022 12:40:00 | orange |
aaa | 01-03-2022 12:40:10 | apple |
aaa | 01-03-2022 12:40:00 | kiwi |
aaa | 01-03-2022 12:40:08 | apple |
bbb | 15-03-2022 13:10:10 | orange |
bbb | 15-03-2022 13:10:18 | apple |
bbb | 15-03-2022 13:10:40 | kiwi |
bbb | 15-03-2022 13:10:15 | apple |
In the above dataframe, whenever a value 'orange' is present for a user in column C, i want to select the earliest date for the same user and that date should correspond to value apple in column C. So if a value 'orange' is present in column C then that value should be retained but the values (date) in column B should correspond to that of 'apple'
A | B | C |
---|---|---|
aaa | 01-03-2022 12:40:08 | orange |
bbb | 15-03-2022 13:10:15 | orange |
CodePudding user response:
So we do two steps
out = df[df.A.isin(df.loc[df['C'].isin(['orange']),'A'])]
out = out.loc[out.C.eq('apple')].sort_values('B').drop_duplicates('A')
out['C'] = 'orange'
CodePudding user response:
# Import Your Data
df = pd.DataFrame({'A':['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb'],
'B':['01-03-2022 12:40:00','01-03-2022 12:40:10','01-03-2022 12:40:00','01-03-2022 12:40:08','15-03-2022 13:10:10','15-03-2022 13:10:18','15-03-2022 13:10:40','15-03-2022 13:10:15'],
'C':['orange','apple','kiwi','apple','orange','apple','kiwi','apple']})
# Data Pre-Process
df.sort_values(['A','C','B'],ascending=[True,True,True],inplace=True)
df['Dup_Num']=df.groupby(['A','C']).cumcount() 1
df_final = df.loc[df['Dup_Num'] ==1]
# Output Show
df_final
Hi Brother,
Please use the code above for your data.
If you have any question please let me know and we can discuss
Thanks Leon