Home > Back-end >  iterating one dataframe and using multiple columns of it as criteria to find value in another datafr
iterating one dataframe and using multiple columns of it as criteria to find value in another datafr

Time:12-11

I have two dataframes Samples and Events

Samples:

 ID     Sample_Date  

0 500001 2021-09-28

1 500002 2021-09-29

2 500003 2021-10-15

Events:

  ID     Event_Date  Event   

0 500001 2021-09-20 BN

1 500001 2021-10-05 BN

2 500003 2021-10-09 AS ...

a desired output example would be:

 ID     Sample_Date  last_Event_to_sample

0 500001 2021-09-28 2021-09-20

1 500002 2021-09-29 NaN

2 500003 2021-10-15 2021-10-09

Samples has a one to many relationship to Events.

Basicly what i wanted to do is to iterate through each row of the Samples Dataframe and use the ID and the Date as a condition to search for the next smaller date in the Events Dataframe for this ID, in order to create a new column in Samples 'last_Event_to_Sample' .

After two days researching i cannot figure out how to approach this. This is the last step to be finally able to create my plots. I would really appreachiate it ,if someone can help me out!

Thank you a lot in advance!

CodePudding user response:

IIUC, you can use pandas.merge_asof :

df_samples["Sample_Date"] = pd.to_datetime(df_samples["Sample_Date"])
df_events["Event_Date"] = pd.to_datetime(df_events["Event_Date"])

out = (
        pd.merge_asof(df_samples, df_events,
                      left_on="Sample_Date", right_on="Event_Date",
                      by="ID", direction="backward")
       )

# Output :

print(out)

       ID Sample_Date Event_Date Event
0  500001  2021-09-28 2021-09-20    BN
1  500002  2021-09-29        NaT   NaN
2  500003  2021-10-15 2021-10-09    AS

With direction="nearest", we get this :

       ID Sample_Date Event_Date Event
0  500001  2021-09-28 2021-10-05    BN
1  500002  2021-09-29        NaT   NaN
2  500003  2021-10-15 2021-10-09    AS

And if needed, you can add a tolerance with tolerance=pd.Timedelta("7d") (for e.g) as a keyword :

       ID Sample_Date Event_Date Event
0  500001  2021-09-28        NaT   NaN
1  500002  2021-09-29        NaT   NaN
2  500003  2021-10-15 2021-10-09    AS

NB : Both dataframes need to be sorted on keys before the merge_asof.

  • Related