I have two dataframes with different dates as follows:
df1 = pd.DataFrame(index=['2022-01-01 00:37:57', '2022-01-01 03:49:12', '2022-01-01 09:30:11'], columns = ['price'])
df1['price'] = [10,13,12]
df1.index = df1.index.rename('date')
df1:
price
date
2022-01-01 00:37:57 10
2022-01-01 03:49:12 13
2022-01-01 09:30:11 12
df2 = pd.DataFrame(index=['2022-01-01 00:35:00', '2022-01-01 00:47:00', '2022-01-01 00:56:12', '2022-01-01 03:45:00', '2022-01-01 03:50:32',
'2022-01-01 09:29:20', '2022-01-01 09:31:21'], columns=['price'])
df2['price'] = [3000,3210, 2999, 3001, 3027, 3021, 3002]
df2.index = df2.index.rename('date')
df2:
price
date
2022-01-01 00:35:00 3000
2022-01-01 00:47:00 3210
2022-01-01 00:56:12 2999
2022-01-01 03:45:00 3001
2022-01-01 03:50:32 3027
2022-01-01 09:29:20 3021
2022-01-01 09:31:21 3002
I want to left join df1 with df2, df1.join(df2,how='left')
, on the hour and nearest minute to get the following:
df:
price_x price_y
date
2022-01-01 00:37:57 10 3000
2022-01-01 03:49:12 13 3210
2022-01-01 09:30:11 12 3021
So for example, the last row left joined on the date "2022-01-01 09:29:20" since it is closest to "2022-01-01 09:30:11".
How can this be done?
CodePudding user response:
Try pd.merge_asof()
(assuming the index of DateTime type and sorted):
print(
pd.merge_asof(
df1,
df2,
left_index=True,
right_index=True,
direction="nearest",
)
)
Prints:
price_x price_y
date
2022-01-01 00:37:57 10 3000
2022-01-01 03:49:12 13 3027
2022-01-01 09:30:11 12 3021