Home > Back-end >  pandas merge two dataframe by key datetime and an interval
pandas merge two dataframe by key datetime and an interval

Time:02-11

I need to sort transaction records by time interval and the data is from two data files but I have no idea how can do this. There has some example and my expected output.

tier_history_data:

code,date_start,date_expiry,tier
1,"2020-01-01 15:15:15","2020-12-31 15:15:15",A
1,"2020-05-23 08:24:57","2021-05-22 08:24:57",C
2,"2020-03-01 10:47:15","2021-02-27 10:47:15",B
2,"2020-09-17 23:14:23","2021-09-16 23:14:23",C
3,"2020-05-01 20:26:19","2021-04-30 20:26:19",C
3,"2020-08-31 12:46:02","2021-08-30 12:46:02",B

transaction_data:

code,transaction_datetime,amount
1,"2020-01-02 13:45:05",20
1,"2020-06-22 12:34:41",230
2,"2020-11-12 15:47:35",50
3,"2020-09-03 18:20:34",10

expected output:

code,tramsaction_datetime,amount,tier
1,"2020-01-02 13:45:05",20,A
1,"2020-06-22 12:34:41",230,C
2,"2020-11-12 15:47:35",50,C
3,"2020-09-03 18:20:34",10,B

Thanks in advance

CodePudding user response:

It rather looks like you want to merge the data. As your intervals are disjoint, this is a perfect use case for merge_asof.

First ensure to have datetime type and that the data is sorted on the merging dates:

df1['date_start'] = pd.to_datetime(df1['date_start'])
df1['date_expiry'] = pd.to_datetime(df1['date_expiry'])
df2['transaction_datetime'] = pd.to_datetime(df2['transaction_datetime'])

df1 = df1.sort_values(by='date_start')
df2 = df2.sort_values(by='transaction_datetime')

Then perform the merge:

df3 = (
 pd.merge_asof(df2, df1, by='code',
               left_on='transaction_datetime',
               right_on='date_start',
               )
   .sort_values(by='code')
   .drop(['date_start', 'date_expiry'], axis=1)
 )

Output:

   code transaction_datetime  amount tier
0     1  2020-01-02 13:45:05      20    A
1     1  2020-06-22 12:34:41     230    C
3     2  2020-11-12 15:47:35      50    C
2     3  2020-09-03 18:20:34      10    B
  • Related