I would like to pandas merge_asof join the following dataframes
ll = pd.DataFrame([[pd.to_datetime('2010-01-01')], [pd.to_datetime('2010-02-01')]], columns = ['date_left'])
rr = pd.DataFrame([[pd.to_datetime('2010-01-01'), 12],
[pd.to_datetime('2010-01-01'), 6]], columns = ['date_right', 'variable'])
This is, ll:
date_left
0 2010-01-01
1 2010-02-01
and rr:
date_right variable
0 2010-01-01 12
1 2010-01-01 6
The following
pd.merge_asof(ll, rr, left_on = 'date_left', right_on='date_right', direction='backward')
gets me
date_left date_right variable
0 2010-01-01 2010-01-01 6
1 2010-02-01 2010-01-01 6
but I would like (and expect, as it is a left join)
date_left date_right variable
0 2010-01-01 2010-01-01 6
1 2010-01-01 2010-01-01 12
2 2010-02-01 2010-01-01 6
3 2010-02-01 2010-01-01 12
How can I achieve this result?
---- EDIT ----: Sammywemmy gave the solution to use janitors conditional_join. This works for the minimalistic example I posted above. However, I still want the rest of the merge_asof functionality. With this I mean the following:
ll = pd.DataFrame([[pd.to_datetime('2010-01-01')], [pd.to_datetime('2010-02-01')],[pd.to_datetime('2010-03-01')], [pd.to_datetime('2010-04-01')]], columns = ['date_left'])
ll =
date_left
0 2010-01-01
1 2010-02-01
2 2010-03-01
3 2010-04-01
and
rr = pd.DataFrame([[pd.to_datetime('2010-01-01'), 12],
[pd.to_datetime('2010-01-01'), 6],
[pd.to_datetime('2010-03-01'), 3]], columns = ['date_right', 'variable'])
rr =
date_right variable
0 2010-01-01 12
1 2010-01-01 6
2 2010-03-01 3
Then I would like:
date_left date_right variable
0 2010-01-01 2010-01-01 6
1 2010-01-01 2010-01-01 12
2 2010-02-01 2010-01-01 6
3 2010-02-01 2010-01-01 12
4 2010-03-01 2010-03-01 3
5 2010-04-01 2010-03-01 3
Whereas the conditional join would give me :
date_left date_right variable
0 2010-01-01 2010-01-01 12
1 2010-01-01 2010-01-01 6
2 2010-02-01 2010-01-01 12
3 2010-02-01 2010-01-01 6
4 2010-03-01 2010-01-01 12
5 2010-03-01 2010-01-01 6
6 2010-03-01 2010-03-01 3
7 2010-04-01 2010-01-01 12
8 2010-04-01 2010-01-01 6
9 2010-04-01 2010-03-01 3
thanks
CodePudding user response:
IIUC, pd.merge_asof
, followed by a merge
should suffice:
(pd.merge_asof(ll, rr.date_right, left_on='date_left', right_on = 'date_right')
.merge(rr, on='date_right', how = 'left')
)
date_left date_right variable
0 2010-01-01 2010-01-01 12
1 2010-01-01 2010-01-01 6
2 2010-02-01 2010-01-01 12
3 2010-02-01 2010-01-01 6
this works also for the updated sample question:
(pd.merge_asof(ll, rr.date_right, left_on='date_left', right_on = 'date_right')
.merge(rr, on='date_right', how = 'left')
)
date_left date_right variable
0 2010-01-01 2010-01-01 12
1 2010-01-01 2010-01-01 6
2 2010-02-01 2010-01-01 12
3 2010-02-01 2010-01-01 6
4 2010-03-01 2010-03-01 3
5 2010-04-01 2010-03-01 3