Home > other >  Comparing datetimes between different data frames
Comparing datetimes between different data frames

Time:02-27

I have two different data frames.

df1= 
index Time
0   2009-09-13 01:17:00
1   2009-09-13 02:17:00
2   2009-09-13 03:17:00
3   2009-09-13 04:17:00
4   2009-09-13 05:17:00
............

This is a time-series data with hourly interval spanning in total 10 days. and

df2=
    Report Time                 x
0   2009-09-13 01:17:00         1
1   2009-09-13 02:20:00         27
2   2009-09-13 02:25:00         1
3   2009-09-13 05:33:00         100
..............

I want to iterate the "df1" hourly time data through "df2" and do the summation of "x" to its hourly value. Then The hourly data will store the x value.

The output should look like this:

index Time                    x
0   2009-09-13 01:17:00       1
1   2009-09-13 02:17:00      28
2   2009-09-13 03:17:00       0
3   2009-09-13 04:17:00       0
4   2009-09-13 05:17:00       100

Note that all of the x value should be added together within the same hour for the dataset "df1".

I have tried to implement for loop but couldn't complete it.

CodePudding user response:

Use DatetimeIndex.floor to get same time between the 2 dataframes:

# Ensure the columns have datetime64 dtype
df1['Time'] = pd.to_datetime(df1['Time'])
df2['Report Time'] = pd.to_datetime(df2['Report Time'])

out = (
    df1.merge(df2.groupby(df2['Report Time'].dt.floor('H')).sum(), 
            left_on=df1['Time'].dt.floor('H'), right_index=True, how='left')
      .fillna(0)
)

Output:

>>> out
                 Time      x
0 2009-09-13 01:17:00    1.0
1 2009-09-13 02:17:00   28.0
2 2009-09-13 03:17:00    0.0
3 2009-09-13 04:17:00    0.0
4 2009-09-13 05:17:00  100.0

Details:

>>> df2.groupby(df2['Report Time'].dt.floor('H')).sum()
                       x
Report Time             
2009-09-13 01:00:00    1
2009-09-13 02:00:00   28
2009-09-13 05:00:00  100

>>> df1['Time'].dt.floor('H')
0   2009-09-13 01:00:00
1   2009-09-13 02:00:00
2   2009-09-13 03:00:00
3   2009-09-13 04:00:00
4   2009-09-13 05:00:00
Name: Time, dtype: datetime64[ns]

CodePudding user response:

You need to perform a merge_asof:

df1['Time'] = pd.to_datetime(df1['Time'])
df2['Report Time'] = pd.to_datetime(df2['Report Time'])

(pd
 .merge_asof(df2, df1, left_on='Report Time', right_on='Time')
 .groupby('Time')
 .agg({'x': 'sum'})
 .reindex(df1['Time'], fill_value=0)
 .reset_index()
 )

Output:

                 Time    x
0 2009-09-13 01:17:00    1
1 2009-09-13 02:17:00   28
2 2009-09-13 03:17:00    0
3 2009-09-13 04:17:00    0
4 2009-09-13 05:17:00  100
  • Related