Home > OS >  How to check if id in DF1 appeared within the past 30 minutes in DF2? using Pandas
How to check if id in DF1 appeared within the past 30 minutes in DF2? using Pandas

Time:04-16

I have DF1 with customer_id, datetime and fruites purchases, and DF2 with customer_id, datetime and Veggies purchases, how to check if within the past 30 minutes of the fruit purchases a customer did a Veggies puchase?

df1.head()
customer_id puchase_date fruit_item
1 2019-08-01 23:55:55 Apples
2 2019-08-01 23:58:32 Bananas
df2.head()
customer_id puchase_date veggies_item
1 2019-08-01 23:44:55 Eggplants
2 2019-08-01 22:00:32 Carrots
#after writing the required code and adding a new column to df1
df1.head()
customer_id puchase_date fruit_item baught_veggies_last_30_minutes?
1 2019-08-01 23:55:55 Apples Yes
2 2019-08-01 23:58:32 Bananas No

CodePudding user response:

You can use merge_asof. You want to merge within 30 mins of the purchase date, so you set that using the tolerance parameter. Note that you misspelt purchase as puchase. I spelt it the same way so that you can run it without error.

out = (pd.merge_asof(df1.assign(puchase_date=pd.to_datetime(df1['puchase_date'])).sort_values(by='puchase_date'),
                    df2.assign(puchase_date=pd.to_datetime(df2['puchase_date'])).sort_values(by='puchase_date'), 
                    on='puchase_date', 
                    by='customer_id', 
                    tolerance=pd.Timedelta('30 minute'))
       .rename(columns={'veggies_item':'bought_veggies_last_30_minutes'})
       .assign(bought_veggies_last_30_minutes=lambda x: x['bought_veggies_last_30_minutes']
               .notna().replace({True: 'Yes', False:'No'})))

Output:

   customer_id        puchase_date fruit_item bought_veggies_last_30_minutes
0            1 2019-08-01 23:55:55     Apples                            Yes
1            2 2019-08-01 23:58:32    Bananas                             No
  • Related