Home > Blockchain >  Using Pandas, I need to merge 2 dataframes, from the second df I need only the values with the neare
Using Pandas, I need to merge 2 dataframes, from the second df I need only the values with the neare

Time:10-25

I'm sorry if this is confusing. Newbie here. I have 2 data frames that contain data from different patients. The first dataframe (df1) contains results for testA, the results are recorded many times per day:

   Patient     TestA        Time(A)                         
0  1            617.0     2019-12-07 13:21:15.224157
1  1            53.0      2019-12-07 14:21:15.224157
2  1            40.0      2019-12-07 15:21:15.224157
3  2            35.0      2019-12-07 13:21:15.224157
3  2            42.0      2019-12-07 16:21:15.224157

and then another dataframe (df2) with the results for TestB:

   Patient     TestB        Time(B)                         
0  1            60.0      2019-12-07 12:21:15.224157
1  1            56.0      2019-12-07 12:31:15.224157
2  1            45.0      2019-12-07 14:21:15.224157
3  1            32.0      2019-12-07 15:21:15.224157
3  2            42.0      2019-12-07 16:21:15.224157

df2 is a lot longer than df1 because the results of TestB get registered more often. I need a dataframe that contains the value of TestA, and the value of TestB nearest in time to TimeA, but TimeB must be lower than TimeA. So, for example. If the result of TestA was taken at 6:05am, and I have 3 different values for TestB taken at: 6:01am, 6:02am, and 6:06am I would want the value from 6:02am because it's the nearest to TimeA(6:05am) that is also lower than TimeA. It would look something like this:

   Patient     TestA        Time(A)                    TestB         Time(B)   
     1         617.0         6:05am                     50             6:02
     1          "               "                        "               "
     1      

Please help, thank you

CodePudding user response:

.merge_of is what you need. Make sure Time columns are datetime and sorted prior to merging.

df1["Time(A)"] = pd.to_datetime(df1["Time(A)"])
df1 = df1.sort_values(["Time(A)"], ascending=[1])
df2["Time(B)"] = pd.to_datetime(df2["Time(B)"])
df2 = df2.sort_values(["Time(B)"], ascending=[1])

merged = pd.merge_asof(
    left=df1,
    right=df2,
    left_on="Time(A)",
    right_on="Time(B)",
    by="Patient",
    direction="backward"
)

print(merged)


   Patient  TestA                    Time(A)  TestB                    Time(B)
0        1    617 2019-12-07 13:21:15.224157   56.0 2019-12-07 12:31:15.224157
1        2     35 2019-12-07 13:21:15.224157    NaN                        NaT
2        1     53 2019-12-07 14:21:15.224157   45.0 2019-12-07 14:21:15.224157
3        1     40 2019-12-07 15:21:15.224157   32.0 2019-12-07 15:21:15.224157
4        2     42 2019-12-07 16:21:15.224157   42.0 2019-12-07 16:21:15.224157

CodePudding user response:

You are looking for pandas.merge_asof

Make sure your dates are correct (information gets lost when I copy your data)

df1["Time(A)"] = pd.to_datetime(df1["Time(A)"])
df2["Time(B)"] = pd.to_datetime(df2["Time(B)"])

pd.merge_asof(df1, df2, by="Patient", left_on="Time(A)", right_on="Time(B)", allow_exact_matches=True)


   Patient  TestA                    Time(A)  TestB                    Time(B)
0        1  617.0 2019-12-07 13:21:15.224157   56.0 2019-12-07 12:31:15.224157
1        1   53.0 2019-12-07 14:21:15.224157   45.0 2019-12-07 14:21:15.224157
2        1   40.0 2019-12-07 15:21:15.224157   45.0 2019-12-07 14:21:15.224157
3        2   42.0 2019-12-07 16:21:15.224157   42.0 2019-12-07 16:21:15.224157
  • Related