cross join pandas dataframe-CodePudding

A	B	C
aaa	01-03-2022 12:40:00	orange
aaa	01-03-2022 12:40:10	apple
aaa	01-03-2022 12:40:00	kiwi
aaa	01-03-2022 12:40:08	apple
bbb	15-03-2022 13:10:10	orange
bbb	15-03-2022 13:10:18	apple
bbb	15-03-2022 13:10:40	kiwi
bbb	15-03-2022 13:10:15	apple

In the above dataframe, whenever a value 'orange' is present for a user in column C, i want to select the earliest date for the same user and that date should correspond to value apple in column C. So if a value 'orange' is present in column C then that value should be retained but the values (date) in column B should correspond to that of 'apple'

A	B	C
aaa	01-03-2022 12:40:08	orange
bbb	15-03-2022 13:10:15	orange

CodePudding user response：

So we do two steps

out = df[df.A.isin(df.loc[df['C'].isin(['orange']),'A'])]

out = out.loc[out.C.eq('apple')].sort_values('B').drop_duplicates('A')

out['C'] = 'orange'

CodePudding user response：

# Import Your Data
df = pd.DataFrame({'A':['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb'],
'B':['01-03-2022 12:40:00','01-03-2022 12:40:10','01-03-2022 12:40:00','01-03-2022 12:40:08','15-03-2022 13:10:10','15-03-2022 13:10:18','15-03-2022 13:10:40','15-03-2022 13:10:15'],
'C':['orange','apple','kiwi','apple','orange','apple','kiwi','apple']})

# Data Pre-Process
df.sort_values(['A','C','B'],ascending=[True,True,True],inplace=True)
df['Dup_Num']=df.groupby(['A','C']).cumcount() 1
df_final = df.loc[df['Dup_Num'] ==1]

# Output Show
df_final

Hi Brother,

Please use the code above for your data.

If you have any question please let me know and we can discuss

Thanks Leon