Home > Enterprise >  how to match datetime pandas column with range of times in another datetime column
how to match datetime pandas column with range of times in another datetime column

Time:01-06

I have df1 with columns 'Reported Date' and 'order',df2 has columns 'lastoccurance' and 'orders', I want to merge df1 and df2 with condition lastoccurrence = Reported date or - 30 minutes(range of -/ 30 minutes) and orders = order Note: Reports date and lastoccurrence are datetime columns

ex: Input

df1

Reported Date            order          
12/14/2022  6:10:32 PM        A
9/15/2022  2:45:57 AM         B
9/15/2022  11:08:26 AM        C

df2

lastoccurrence          orders
12/14/2022  6:15:35 PM        A
12/14/2022  6:00:35 PM        A
12/14/2022  5:40:35 PM        A
12/14/2022  6:40:35 PM        A
12/14/2022  6:10:32 PM        B
9/15/2022  11:20:26 AM        C
9/15/2022  11:08:26 AM        A

Output

df3

lastoccurrence          orders
12/14/2022  6:15:35 PM        A
12/14/2022  6:00:35 PM        A
12/14/2022  5:40:35 PM        A
12/14/2022  6:40:35 PM        A
9/15/2022  11:20:26 AM        C

CodePudding user response:

First convert both columns to datetimes, then mapping Reported Date by orders by Series.map, so possible compare by Series.between in tolerance:

df1['Reported Date'] = pd.to_datetime(df1['Reported Date'])
df2['lastoccurrence'] = pd.to_datetime(df2['lastoccurrence'])

s = df2['orders'].map(df1.set_index('order')['Reported Date'])

td =  pd.Timedelta('30 min')
df = df2[df2['lastoccurrence'].between(s - td, s   td)]
print (df)
       lastoccurrence orders
0 2022-12-14 18:15:35      A
1 2022-12-14 18:00:35      A
2 2022-12-14 17:40:35      A
5 2022-09-15 11:20:26      C

Or use merge_asof with tolerance parameter and direction='nearest':

df1['Reported Date'] = pd.to_datetime(df1['Reported Date'])
df2['lastoccurrence'] = pd.to_datetime(df2['lastoccurrence'])


df = pd.merge_asof(df2.sort_values('lastoccurrence'), 
                   df1.sort_values('Reported Date'), 
                   left_by='orders',
                   right_by='order', 
                   left_on='lastoccurrence', 
                   right_on='Reported Date', 
                   tolerance=pd.Timedelta('30 min'),
                   direction='nearest').dropna(subset=['order']).drop(['Reported Date','order'], axis=1)

print (df)
       lastoccurrence orders
1 2022-09-15 11:20:26      C
2 2022-12-14 17:40:35      A
3 2022-12-14 18:00:35      A
5 2022-12-14 18:15:35      A
  • Related