I have df1 with columns 'Reported Date' and 'order',df2 has columns 'lastoccurance' and 'orders', I want to merge df1 and df2 with condition lastoccurrence = Reported date or - 30 minutes(range of -/ 30 minutes) and orders = order Note: Reports date and lastoccurrence are datetime columns
ex: Input
df1
Reported Date order
12/14/2022 6:10:32 PM A
9/15/2022 2:45:57 AM B
9/15/2022 11:08:26 AM C
df2
lastoccurrence orders
12/14/2022 6:15:35 PM A
12/14/2022 6:00:35 PM A
12/14/2022 5:40:35 PM A
12/14/2022 6:40:35 PM A
12/14/2022 6:10:32 PM B
9/15/2022 11:20:26 AM C
9/15/2022 11:08:26 AM A
Output
df3
lastoccurrence orders
12/14/2022 6:15:35 PM A
12/14/2022 6:00:35 PM A
12/14/2022 5:40:35 PM A
12/14/2022 6:40:35 PM A
9/15/2022 11:20:26 AM C
CodePudding user response:
First convert both columns to datetimes, then mapping Reported Date
by orders
by Series.map
, so possible compare by Series.between
in tolerance:
df1['Reported Date'] = pd.to_datetime(df1['Reported Date'])
df2['lastoccurrence'] = pd.to_datetime(df2['lastoccurrence'])
s = df2['orders'].map(df1.set_index('order')['Reported Date'])
td = pd.Timedelta('30 min')
df = df2[df2['lastoccurrence'].between(s - td, s td)]
print (df)
lastoccurrence orders
0 2022-12-14 18:15:35 A
1 2022-12-14 18:00:35 A
2 2022-12-14 17:40:35 A
5 2022-09-15 11:20:26 C
Or use merge_asof
with tolerance parameter and direction='nearest'
:
df1['Reported Date'] = pd.to_datetime(df1['Reported Date'])
df2['lastoccurrence'] = pd.to_datetime(df2['lastoccurrence'])
df = pd.merge_asof(df2.sort_values('lastoccurrence'),
df1.sort_values('Reported Date'),
left_by='orders',
right_by='order',
left_on='lastoccurrence',
right_on='Reported Date',
tolerance=pd.Timedelta('30 min'),
direction='nearest').dropna(subset=['order']).drop(['Reported Date','order'], axis=1)
print (df)
lastoccurrence orders
1 2022-09-15 11:20:26 C
2 2022-12-14 17:40:35 A
3 2022-12-14 18:00:35 A
5 2022-12-14 18:15:35 A