Home > Net >  How to merge as-of for date but left-join for another column?
How to merge as-of for date but left-join for another column?

Time:11-23

I have the following dataframes:

assets  = pd.DataFrame(columns = ['date','asset'], data = [[datetime.date(2022,10,21),'SPY'],[datetime.date(2022,10,21),'FTSE'], [datetime.date(2022,11,12),'SPY'],[datetime.date(2022,11,12),'FTSE']])
prices  = pd.DataFrame(columns = ['date','asset', 'price'], data = [[datetime.date(2022,10,11),'SPY',10],[datetime.date(2022,10,11),'FTSE',5],[datetime.date(2022,11,8),'SPY',100],[datetime.date(2022,11,8),'FTSE',50]]

for each asset I want to get the as-of price (at the nearest date). How to do so?

If I have only one asset it is easy:

assets_spy = assets#.loc[assets['asset']=='SPY']
prices_spy = prices#.loc[prices['asset']=='SPY']
assets_spy.index = pd.to_datetime(assets_spy['date'])
prices_spy.index = pd.to_datetime(prices_spy['date'])

merged = pd.merge_asof(assets_spy.sort_index(), 
                prices_spy.sort_index(), 
                direction='nearest',right_index=True,left_index=True)

but if I follow the same logic for multiple assets, it won't match.

CodePudding user response:

The function pandas.merge_asof has an optional parameter named by that you can use for matching using a column or list of columns before performing merge operation. Therefore, you could adapt your code like this:


import pandas as pd
import datetime

assets  = pd.DataFrame(columns = ['date', 'asset'],
                       data = [[datetime.date(2022, 10, 21), 'SPY'],
                               [datetime.date(2022, 10, 21), 'FTSE'],
                               [datetime.date(2022, 11, 12), 'SPY'],
                               [datetime.date(2022, 11, 12), 'FTSE']])

prices  = pd.DataFrame(columns = ['date', 'asset', 'price'],
                       data = [[datetime.date(2022, 10, 11), 'SPY', 10],
                               [datetime.date(2022, 10, 11), 'FTSE', 5],
                               [datetime.date(2022, 11, 8), 'SPY', 100],
                               [datetime.date(2022, 11, 8), 'FTSE', 50]])


merged = pd.merge_asof(
    assets.astype({'date': 'datetime64[ns]'}).sort_values('date').convert_dtypes(),
    prices.astype({'date': 'datetime64[ns]'}).sort_values('date').convert_dtypes(),
    direction = 'nearest',
    on = 'date',
    by = 'asset',
)
merged
# Returns:
#
#         date asset  price
# 0 2022-10-21   SPY     10  <-- Merged using SPY price from '2022-10-11'
# 1 2022-10-21  FTSE      5  <-- Merged using FTSE price from '2022-10-11'
# 2 2022-11-12   SPY    100  <-- Merged using SPY price from '2022-11-08'
# 3 2022-11-12  FTSE     50  <-- Merged using FTSE price from '2022-11-08'

Output Screenshot:

enter image description here

  • Related