I have a data frame (df1) and want to get a previous most recent survey_date for the ID and associated score from another data frame (df2)
df1 = pd.DataFrame({'ID' : [1,2],
'start_date':['2018-08-04','2018-08-09']})
df1
df2 = pd.DataFrame({'ID' : [1,1,2,2],
'survey_date':['2018-08-01','2018-08-05','2018-08-08','2018-08-10'],
'score':[200,100, 400, 800]})
df2
desired output
ID | start date | prev_survey_date | score |
---|---|---|---|
1 | 2018-08-04 | 2018-08-01 | 200 |
2 | 2018-08-09 | 2018-08-08 | 400 |
How can I do this in python?
CodePudding user response:
You can try merge_asof
#df1.start_date = pd.to_datetime(df1.start_date)
#df2.survey_date = pd.to_datetime(df2.survey_date)
out = pd.merge_asof(df1, df2, by = 'ID', left_on = 'start_date', right_on = 'survey_date')
Out[366]:
ID start_date survey_date score
0 1 2018-08-04 2018-08-01 200
1 2 2018-08-09 2018-08-08 400