import pandas as pd
df = pd.DataFrame(
{
'StartDate':['2020-01-01 00:00:00-04:00', '2020-01-01 01:00:00-04:00', '2020-01-01 01:55:00-04:00', '2020-01-02 02:00:00-02:00', '2020-01-02 02:00:00-04:00'],
'Weight':[100, 110, 120, 125, 155]
}
)
df['StartDate'] = pd.to_datetime(df['StartDate'])
df
I want to group the data by the hour and sum up the Weight column. So, the end result would be a df with 3 rows: current index 0, current indexes 1&2, current indexes 3&4.
I came across the Grouper function and I tried the following but it didn't work:
df = df.groupby(pd.Grouper(key='StartDate', freq='H')).sum()
I get the following error:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
Does anyone know what I'm doing wrong or can someone provide a solution?
Thanks
CodePudding user response:
You first need to convert to datetime, taking into account the timezones:
df['StartDate'] = pd.to_datetime(df['StartDate'], utc=True)
df.groupby(pd.Grouper(key='StartDate', freq='H')).sum()
Output:
Weight
StartDate
2020-01-01 04:00:00 00:00 100
2020-01-01 05:00:00 00:00 230
2020-01-01 06:00:00 00:00 0
2020-01-01 07:00:00 00:00 0
2020-01-01 08:00:00 00:00 0
2020-01-01 09:00:00 00:00 0
2020-01-01 10:00:00 00:00 0
2020-01-01 11:00:00 00:00 0
2020-01-01 12:00:00 00:00 0
2020-01-01 13:00:00 00:00 0
2020-01-01 14:00:00 00:00 0
2020-01-01 15:00:00 00:00 0
2020-01-01 16:00:00 00:00 0
2020-01-01 17:00:00 00:00 0
2020-01-01 18:00:00 00:00 0
2020-01-01 19:00:00 00:00 0
2020-01-01 20:00:00 00:00 0
2020-01-01 21:00:00 00:00 0
2020-01-01 22:00:00 00:00 0
2020-01-01 23:00:00 00:00 0
2020-01-02 00:00:00 00:00 0
2020-01-02 01:00:00 00:00 0
2020-01-02 02:00:00 00:00 0
2020-01-02 03:00:00 00:00 0
2020-01-02 04:00:00 00:00 125
2020-01-02 05:00:00 00:00 0
2020-01-02 06:00:00 00:00 155
without "blanks"
df.groupby(pd.to_datetime(df['StartDate'], utc=True).dt.floor('h'))['Weight'].sum()
StartDate
2020-01-01 04:00:00 00:00 100
2020-01-01 05:00:00 00:00 230
2020-01-02 04:00:00 00:00 125
2020-01-02 06:00:00 00:00 155
Name: Weight, dtype: int64