Home > Software engineering >  How to append a new dataset to existing dataset based on index timeseries condition in Python
How to append a new dataset to existing dataset based on index timeseries condition in Python

Time:12-15

I'm really new to python. please any one can help my problem on how to append a new dataset to existing dataset based on index timeseries condition. I need to add each of row from df2 to df1 based on its time with tolerance <5min

here is the example of data I have

df1

Time A
01/9/2021 06:50 1
01/9/2021 06:55 2
01/9/2021 07:00 3
01/9/2021 07:05 6
01/9/2021 07:10 3
01/9/2021 07:15 2
01/9/2021 07:20 1
01/9/2021 07:25 2

df2

Time B
01/9/2021 06:51 0.6
01/9/2021 06:55 0.2
01/9/2021 07:12 0.3
01/9/2021 07:16 0.6

Expected outcome it will add each of row from df2 that match time with tolerance (let say 4 min) to the row of df1.

df3

Time A B
01/9/2021 06:50 1 0.6
01/9/2021 06:55 2 0.2
01/9/2021 07:00 3 NAN
01/9/2021 07:05 6 NAN
01/9/2021 07:10 3 0.3
01/9/2021 07:15 2 0.6
01/9/2021 07:20 1 NAN
01/9/2021 07:25 2 NAN

really appreciate you help. thank you

CodePudding user response:

One way using pandas.to_datetime with pd.Series.dt.round:

df["Time"] = pd.to_datetime(df["Time"])
df2["Time"] = pd.to_datetime(df2["Time"]).dt.round("5min")

new_df = df.merge(df2, on="Time", how="left")
print(new_df)

Output:

                 Time  A    B
0 2021-01-09 06:50:00  1  0.6
1 2021-01-09 06:55:00  2  0.2
2 2021-01-09 07:00:00  3  NaN
3 2021-01-09 07:05:00  6  NaN
4 2021-01-09 07:10:00  3  0.3
5 2021-01-09 07:15:00  2  0.6
6 2021-01-09 07:20:00  1  NaN
7 2021-01-09 07:25:00  2  NaN
  • Related