I need to sort transaction records by time interval and the data is from two data files but I have no idea how can do this. There has some example and my expected output.
tier_history_data:
code,date_start,date_expiry,tier
1,"2020-01-01 15:15:15","2020-12-31 15:15:15",A
1,"2020-05-23 08:24:57","2021-05-22 08:24:57",C
2,"2020-03-01 10:47:15","2021-02-27 10:47:15",B
2,"2020-09-17 23:14:23","2021-09-16 23:14:23",C
3,"2020-05-01 20:26:19","2021-04-30 20:26:19",C
3,"2020-08-31 12:46:02","2021-08-30 12:46:02",B
transaction_data:
code,transaction_datetime,amount
1,"2020-01-02 13:45:05",20
1,"2020-06-22 12:34:41",230
2,"2020-11-12 15:47:35",50
3,"2020-09-03 18:20:34",10
expected output:
code,tramsaction_datetime,amount,tier
1,"2020-01-02 13:45:05",20,A
1,"2020-06-22 12:34:41",230,C
2,"2020-11-12 15:47:35",50,C
3,"2020-09-03 18:20:34",10,B
Thanks in advance
CodePudding user response:
It rather looks like you want to merge the data. As your intervals are disjoint, this is a perfect use case for merge_asof
.
First ensure to have datetime type and that the data is sorted on the merging dates:
df1['date_start'] = pd.to_datetime(df1['date_start'])
df1['date_expiry'] = pd.to_datetime(df1['date_expiry'])
df2['transaction_datetime'] = pd.to_datetime(df2['transaction_datetime'])
df1 = df1.sort_values(by='date_start')
df2 = df2.sort_values(by='transaction_datetime')
Then perform the merge:
df3 = (
pd.merge_asof(df2, df1, by='code',
left_on='transaction_datetime',
right_on='date_start',
)
.sort_values(by='code')
.drop(['date_start', 'date_expiry'], axis=1)
)
Output:
code transaction_datetime amount tier
0 1 2020-01-02 13:45:05 20 A
1 1 2020-06-22 12:34:41 230 C
3 2 2020-11-12 15:47:35 50 C
2 3 2020-09-03 18:20:34 10 B