I'm really new to python. please any one can help my problem on how to append a new dataset to existing dataset based on index timeseries condition. I need to add each of row from df2 to df1 based on its time with tolerance <5min
here is the example of data I have
df1
Time | A |
---|---|
01/9/2021 06:50 | 1 |
01/9/2021 06:55 | 2 |
01/9/2021 07:00 | 3 |
01/9/2021 07:05 | 6 |
01/9/2021 07:10 | 3 |
01/9/2021 07:15 | 2 |
01/9/2021 07:20 | 1 |
01/9/2021 07:25 | 2 |
df2
Time | B |
---|---|
01/9/2021 06:51 | 0.6 |
01/9/2021 06:55 | 0.2 |
01/9/2021 07:12 | 0.3 |
01/9/2021 07:16 | 0.6 |
Expected outcome it will add each of row from df2 that match time with tolerance (let say 4 min) to the row of df1.
df3
Time | A | B |
---|---|---|
01/9/2021 06:50 | 1 | 0.6 |
01/9/2021 06:55 | 2 | 0.2 |
01/9/2021 07:00 | 3 | NAN |
01/9/2021 07:05 | 6 | NAN |
01/9/2021 07:10 | 3 | 0.3 |
01/9/2021 07:15 | 2 | 0.6 |
01/9/2021 07:20 | 1 | NAN |
01/9/2021 07:25 | 2 | NAN |
really appreciate you help. thank you
CodePudding user response:
One way using pandas.to_datetime
with pd.Series.dt.round
:
df["Time"] = pd.to_datetime(df["Time"])
df2["Time"] = pd.to_datetime(df2["Time"]).dt.round("5min")
new_df = df.merge(df2, on="Time", how="left")
print(new_df)
Output:
Time A B
0 2021-01-09 06:50:00 1 0.6
1 2021-01-09 06:55:00 2 0.2
2 2021-01-09 07:00:00 3 NaN
3 2021-01-09 07:05:00 6 NaN
4 2021-01-09 07:10:00 3 0.3
5 2021-01-09 07:15:00 2 0.6
6 2021-01-09 07:20:00 1 NaN
7 2021-01-09 07:25:00 2 NaN