Home > database >  Pandas merge_asof Question: one-to-multiple merge
Pandas merge_asof Question: one-to-multiple merge

Time:06-28

I am trying to merge two dataframes by nearest time point. I tried merge_asof. However, the issue is two entries have the same time point, then merge_asof will only merge one entry into the final dataframe. Here is an example

df_A:

  time     ticker     price
09:10:13    VOD       110.96  

df_B:

  time     ticker     volume
09:10:12    VOD       35412
09:10:12    VOD       12343  

I tried


pd.merge(df_A, df_B, how = 'left', on = ['time'], by=['ticker'], direction='backward')

And it only returns

  time     ticker     price    volume
09:10:13    VOD       110.96    12343  

As you can see one entry/row is deleted here. I want to use merge_asof and keep all entries under the same nearest time point.

Ideal output:

  time     ticker    price      volume
09:10:12    VOD     110.96      35412
09:10:12    VOD     110.96      12343  

Any advise ? Thank you!

CodePudding user response:

You need to perform a double merge, first with the merge_asof, then a classical merge. The trick is to identify the duplicates in the asof merged right DataFrame:

df_B2 = df_B.assign(idx=df_B.groupby('time').ngroup())

out = (pd
 .merge_asof(df_A, df_B2, on='time', by='ticker', direction='backward')
 .merge(df_B2, on=['ticker', 'idx'])
 #.drop(columns='idx')
)

NB. I used datetime columns for the example

output:

               time_x ticker   price  volume_x  idx              time_y  volume_y
0 2022-06-27 09:10:13    VOD  110.96     12343    0 2022-06-27 09:10:12     35412
1 2022-06-27 09:10:13    VOD  110.96     12343    0 2022-06-27 09:10:12     12343

used input:

df_A = pd.DataFrame({'time': [Timestamp('2022-06-27 09:10:13')],
                     'ticker': ['VOD'], 'price': [110.96]})
df_B = pd.DataFrame({'time': [Timestamp('2022-06-27 09:10:12'), Timestamp('2022-06-27 09:10:12')],
                     'ticker': ['VOD', 'VOD'],
                     'volume': [35412, 12343]})
                    
  • Related