I have DF1 with customer_id, datetime and fruites purchases, and DF2 with customer_id, datetime and Veggies purchases, how to check if within the past 30 minutes of the fruit purchases a customer did a Veggies puchase?
df1.head()
customer_id | puchase_date | fruit_item |
---|---|---|
1 | 2019-08-01 23:55:55 | Apples |
2 | 2019-08-01 23:58:32 | Bananas |
df2.head()
customer_id | puchase_date | veggies_item |
---|---|---|
1 | 2019-08-01 23:44:55 | Eggplants |
2 | 2019-08-01 22:00:32 | Carrots |
#after writing the required code and adding a new column to df1
df1.head()
customer_id | puchase_date | fruit_item | baught_veggies_last_30_minutes? |
---|---|---|---|
1 | 2019-08-01 23:55:55 | Apples | Yes |
2 | 2019-08-01 23:58:32 | Bananas | No |
CodePudding user response:
You can use merge_asof
. You want to merge within 30 mins of the purchase date, so you set that using the tolerance
parameter. Note that you misspelt purchase
as puchase
. I spelt it the same way so that you can run it without error.
out = (pd.merge_asof(df1.assign(puchase_date=pd.to_datetime(df1['puchase_date'])).sort_values(by='puchase_date'),
df2.assign(puchase_date=pd.to_datetime(df2['puchase_date'])).sort_values(by='puchase_date'),
on='puchase_date',
by='customer_id',
tolerance=pd.Timedelta('30 minute'))
.rename(columns={'veggies_item':'bought_veggies_last_30_minutes'})
.assign(bought_veggies_last_30_minutes=lambda x: x['bought_veggies_last_30_minutes']
.notna().replace({True: 'Yes', False:'No'})))
Output:
customer_id puchase_date fruit_item bought_veggies_last_30_minutes
0 1 2019-08-01 23:55:55 Apples Yes
1 2 2019-08-01 23:58:32 Bananas No