I have two dataframes Samples and Events
Samples:
ID Sample_Date
0 500001 2021-09-28
1 500002 2021-09-29
2 500003 2021-10-15
Events:
ID Event_Date Event
0 500001 2021-09-20 BN
1 500001 2021-10-05 BN
2 500003 2021-10-09 AS ...
a desired output example would be:
ID Sample_Date last_Event_to_sample
0 500001 2021-09-28 2021-09-20
1 500002 2021-09-29 NaN
2 500003 2021-10-15 2021-10-09
Samples has a one to many relationship to Events.
Basicly what i wanted to do is to iterate through each row of the Samples Dataframe and use the ID and the Date as a condition to search for the next smaller date in the Events Dataframe for this ID, in order to create a new column in Samples 'last_Event_to_Sample' .
After two days researching i cannot figure out how to approach this. This is the last step to be finally able to create my plots. I would really appreachiate it ,if someone can help me out!
Thank you a lot in advance!
CodePudding user response:
IIUC, you can use pandas.merge_asof
:
df_samples["Sample_Date"] = pd.to_datetime(df_samples["Sample_Date"])
df_events["Event_Date"] = pd.to_datetime(df_events["Event_Date"])
out = (
pd.merge_asof(df_samples, df_events,
left_on="Sample_Date", right_on="Event_Date",
by="ID", direction="backward")
)
# Output :
print(out)
ID Sample_Date Event_Date Event
0 500001 2021-09-28 2021-09-20 BN
1 500002 2021-09-29 NaT NaN
2 500003 2021-10-15 2021-10-09 AS
With direction="nearest"
, we get this :
ID Sample_Date Event_Date Event
0 500001 2021-09-28 2021-10-05 BN
1 500002 2021-09-29 NaT NaN
2 500003 2021-10-15 2021-10-09 AS
And if needed, you can add a tolerance with tolerance=pd.Timedelta("7d")
(for e.g) as a keyword :
ID Sample_Date Event_Date Event
0 500001 2021-09-28 NaT NaN
1 500002 2021-09-29 NaT NaN
2 500003 2021-10-15 2021-10-09 AS
NB : Both dataframes need to be sorted on keys before the merge_asof.