I'm sorry if this is confusing. Newbie here. I have 2 data frames that contain data from different patients. The first dataframe (df1) contains results for testA, the results are recorded many times per day:
Patient TestA Time(A)
0 1 617.0 2019-12-07 13:21:15.224157
1 1 53.0 2019-12-07 14:21:15.224157
2 1 40.0 2019-12-07 15:21:15.224157
3 2 35.0 2019-12-07 13:21:15.224157
3 2 42.0 2019-12-07 16:21:15.224157
and then another dataframe (df2) with the results for TestB:
Patient TestB Time(B)
0 1 60.0 2019-12-07 12:21:15.224157
1 1 56.0 2019-12-07 12:31:15.224157
2 1 45.0 2019-12-07 14:21:15.224157
3 1 32.0 2019-12-07 15:21:15.224157
3 2 42.0 2019-12-07 16:21:15.224157
df2 is a lot longer than df1 because the results of TestB get registered more often. I need a dataframe that contains the value of TestA, and the value of TestB nearest in time to TimeA, but TimeB must be lower than TimeA. So, for example. If the result of TestA was taken at 6:05am, and I have 3 different values for TestB taken at: 6:01am, 6:02am, and 6:06am I would want the value from 6:02am because it's the nearest to TimeA(6:05am) that is also lower than TimeA. It would look something like this:
Patient TestA Time(A) TestB Time(B)
1 617.0 6:05am 50 6:02
1 " " " "
1
Please help, thank you
CodePudding user response:
.merge_of is what you need. Make sure Time columns are datetime and sorted prior to merging.
df1["Time(A)"] = pd.to_datetime(df1["Time(A)"])
df1 = df1.sort_values(["Time(A)"], ascending=[1])
df2["Time(B)"] = pd.to_datetime(df2["Time(B)"])
df2 = df2.sort_values(["Time(B)"], ascending=[1])
merged = pd.merge_asof(
left=df1,
right=df2,
left_on="Time(A)",
right_on="Time(B)",
by="Patient",
direction="backward"
)
print(merged)
Patient TestA Time(A) TestB Time(B)
0 1 617 2019-12-07 13:21:15.224157 56.0 2019-12-07 12:31:15.224157
1 2 35 2019-12-07 13:21:15.224157 NaN NaT
2 1 53 2019-12-07 14:21:15.224157 45.0 2019-12-07 14:21:15.224157
3 1 40 2019-12-07 15:21:15.224157 32.0 2019-12-07 15:21:15.224157
4 2 42 2019-12-07 16:21:15.224157 42.0 2019-12-07 16:21:15.224157
CodePudding user response:
You are looking for pandas.merge_asof
Make sure your dates are correct (information gets lost when I copy your data)
df1["Time(A)"] = pd.to_datetime(df1["Time(A)"])
df2["Time(B)"] = pd.to_datetime(df2["Time(B)"])
pd.merge_asof(df1, df2, by="Patient", left_on="Time(A)", right_on="Time(B)", allow_exact_matches=True)
Patient TestA Time(A) TestB Time(B)
0 1 617.0 2019-12-07 13:21:15.224157 56.0 2019-12-07 12:31:15.224157
1 1 53.0 2019-12-07 14:21:15.224157 45.0 2019-12-07 14:21:15.224157
2 1 40.0 2019-12-07 15:21:15.224157 45.0 2019-12-07 14:21:15.224157
3 2 42.0 2019-12-07 16:21:15.224157 42.0 2019-12-07 16:21:15.224157