Home > front end >  pandas merge asof with more than one match
pandas merge asof with more than one match

Time:01-14

I would like to pandas merge_asof join the following dataframes

ll = pd.DataFrame([[pd.to_datetime('2010-01-01')], [pd.to_datetime('2010-02-01')]], columns = ['date_left'])
rr = pd.DataFrame([[pd.to_datetime('2010-01-01'), 12],
                   [pd.to_datetime('2010-01-01'), 6]], columns = ['date_right', 'variable'])

This is, ll:

    date_left
0   2010-01-01
1   2010-02-01

and rr:

    date_right  variable
0   2010-01-01  12
1   2010-01-01  6

The following

pd.merge_asof(ll, rr, left_on = 'date_left', right_on='date_right', direction='backward')

gets me

    date_left   date_right  variable
0   2010-01-01  2010-01-01  6
1   2010-02-01  2010-01-01  6

but I would like (and expect, as it is a left join)

    date_left   date_right  variable
0   2010-01-01  2010-01-01  6
1   2010-01-01  2010-01-01  12
2   2010-02-01  2010-01-01  6
3   2010-02-01  2010-01-01  12

How can I achieve this result?

---- EDIT ----: Sammywemmy gave the solution to use janitors conditional_join. This works for the minimalistic example I posted above. However, I still want the rest of the merge_asof functionality. With this I mean the following:

ll = pd.DataFrame([[pd.to_datetime('2010-01-01')], [pd.to_datetime('2010-02-01')],[pd.to_datetime('2010-03-01')], [pd.to_datetime('2010-04-01')]], columns = ['date_left'])

ll =

    date_left
0   2010-01-01
1   2010-02-01
2   2010-03-01
3   2010-04-01

and

rr = pd.DataFrame([[pd.to_datetime('2010-01-01'), 12],
                   [pd.to_datetime('2010-01-01'), 6],
                   [pd.to_datetime('2010-03-01'), 3]], columns = ['date_right', 'variable'])

rr =

date_right  variable
0   2010-01-01  12
1   2010-01-01  6
2   2010-03-01  3

Then I would like:

    date_left   date_right  variable
0   2010-01-01  2010-01-01  6
1   2010-01-01  2010-01-01  12
2   2010-02-01  2010-01-01  6
3   2010-02-01  2010-01-01  12
4   2010-03-01  2010-03-01  3
5   2010-04-01  2010-03-01  3

Whereas the conditional join would give me :

    date_left   date_right  variable
0   2010-01-01  2010-01-01  12
1   2010-01-01  2010-01-01  6
2   2010-02-01  2010-01-01  12
3   2010-02-01  2010-01-01  6
4   2010-03-01  2010-01-01  12
5   2010-03-01  2010-01-01  6
6   2010-03-01  2010-03-01  3
7   2010-04-01  2010-01-01  12
8   2010-04-01  2010-01-01  6
9   2010-04-01  2010-03-01  3

thanks

CodePudding user response:

IIUC, pd.merge_asof, followed by a merge should suffice:

(pd.merge_asof(ll, rr.date_right, left_on='date_left', right_on = 'date_right')
   .merge(rr, on='date_right', how = 'left')
)
   date_left date_right  variable
0 2010-01-01 2010-01-01        12
1 2010-01-01 2010-01-01         6
2 2010-02-01 2010-01-01        12
3 2010-02-01 2010-01-01         6

this works also for the updated sample question:

(pd.merge_asof(ll, rr.date_right, left_on='date_left', right_on = 'date_right')
   .merge(rr, on='date_right', how = 'left')
)

   date_left date_right  variable
0 2010-01-01 2010-01-01        12
1 2010-01-01 2010-01-01         6
2 2010-02-01 2010-01-01        12
3 2010-02-01 2010-01-01         6
4 2010-03-01 2010-03-01         3
5 2010-04-01 2010-03-01         3
  •  Tags:  
  • Related