Home > database >  cross join pandas dataframe
cross join pandas dataframe

Time:08-31

A B C
aaa 01-03-2022 12:40:00 orange
aaa 01-03-2022 12:40:10 apple
aaa 01-03-2022 12:40:00 kiwi
aaa 01-03-2022 12:40:08 apple
bbb 15-03-2022 13:10:10 orange
bbb 15-03-2022 13:10:18 apple
bbb 15-03-2022 13:10:40 kiwi
bbb 15-03-2022 13:10:15 apple

In the above dataframe, whenever a value 'orange' is present for a user in column C, i want to select the earliest date for the same user and that date should correspond to value apple in column C. So if a value 'orange' is present in column C then that value should be retained but the values (date) in column B should correspond to that of 'apple'

A B C
aaa 01-03-2022 12:40:08 orange
bbb 15-03-2022 13:10:15 orange

CodePudding user response:

So we do two steps

out = df[df.A.isin(df.loc[df['C'].isin(['orange']),'A'])]

out = out.loc[out.C.eq('apple')].sort_values('B').drop_duplicates('A')

out['C'] = 'orange'

CodePudding user response:

# Import Your Data
df = pd.DataFrame({'A':['aaa','aaa','aaa','aaa','bbb','bbb','bbb','bbb'],
'B':['01-03-2022 12:40:00','01-03-2022 12:40:10','01-03-2022 12:40:00','01-03-2022 12:40:08','15-03-2022 13:10:10','15-03-2022 13:10:18','15-03-2022 13:10:40','15-03-2022 13:10:15'],
'C':['orange','apple','kiwi','apple','orange','apple','kiwi','apple']})

# Data Pre-Process
df.sort_values(['A','C','B'],ascending=[True,True,True],inplace=True)
df['Dup_Num']=df.groupby(['A','C']).cumcount() 1
df_final = df.loc[df['Dup_Num'] ==1]

# Output Show
df_final

Hi Brother,

Please use the code above for your data.

If you have any question please let me know and we can discuss

Thanks Leon

  • Related