Home > Enterprise >  Python: Join two dataframes based on hour and nearest minute of date index
Python: Join two dataframes based on hour and nearest minute of date index

Time:02-18

I have two dataframes with different dates as follows:

df1 = pd.DataFrame(index=['2022-01-01 00:37:57', '2022-01-01 03:49:12', '2022-01-01 09:30:11'], columns = ['price'])
df1['price'] = [10,13,12]
df1.index = df1.index.rename('date')
df1:
                        price
date                      
2022-01-01 00:37:57     10
2022-01-01 03:49:12     13
2022-01-01 09:30:11     12 

df2 = pd.DataFrame(index=['2022-01-01 00:35:00', '2022-01-01 00:47:00', '2022-01-01 00:56:12', '2022-01-01 03:45:00', '2022-01-01 03:50:32',
                        '2022-01-01 09:29:20', '2022-01-01 09:31:21'], columns=['price'])
df2['price'] = [3000,3210, 2999, 3001, 3027, 3021, 3002]
df2.index = df2.index.rename('date')
df2:
                      price
date                      
2022-01-01 00:35:00   3000
2022-01-01 00:47:00   3210
2022-01-01 00:56:12   2999
2022-01-01 03:45:00   3001
2022-01-01 03:50:32   3027
2022-01-01 09:29:20   3021
2022-01-01 09:31:21   3002

I want to left join df1 with df2, df1.join(df2,how='left'), on the hour and nearest minute to get the following:

df:
                        price_x price_y
date
2022-01-01 00:37:57     10      3000
2022-01-01 03:49:12     13      3210
2022-01-01 09:30:11     12      3021

So for example, the last row left joined on the date "2022-01-01 09:29:20" since it is closest to "2022-01-01 09:30:11".

How can this be done?

CodePudding user response:

Try pd.merge_asof() (assuming the index of DateTime type and sorted):

print(
    pd.merge_asof(
        df1,
        df2,
        left_index=True,
        right_index=True,
        direction="nearest",
    )
)

Prints:

                     price_x  price_y
date                                 
2022-01-01 00:37:57       10     3000
2022-01-01 03:49:12       13     3027
2022-01-01 09:30:11       12     3021
  • Related