aggregating on a shifted 24H window-CodePudding

Consider the following example

import datetime
import pandas as pd

df = pd.DataFrame({'var' : np.random.randint(0,10, size = 16993)},
             index = pd.date_range('2021-01-01 00:00:00',
                                   '2021-03-01 00:00:00',
                                   freq ='5 min'))
df.head(10)
Out[272]: 
                     var
2021-01-01 00:00:00    5
2021-01-01 00:05:00    9
2021-01-01 00:10:00    1
2021-01-01 00:15:00    5
2021-01-01 00:20:00    8

This is a classic dataframe with intraday data. I need to aggregate it at the daily level, but I do not want to use the 24 hours from 00:00:00 to 23:59:59 that would be implicitly used if I ran df.groupby(df.index.date)

df.groupby(df.index.date).agg('sum').head()
Out[274]: 
             var
2021-01-01  1440
2021-01-02  1238
2021-01-03  1274
2021-01-04  1277
2021-01-05  1253

Rather, I want to aggregate using a 24H windows that starts at 1:00:00 every day. So the first group will be between 2021-01-01 01:00:00 and 2021-02-01 00:59:59, the second between 2021-02-01 01:00:00 and 2021-03-01 00:59:59, and so on. The starting times can easily be identified, as shown below.

df['flag'] = df.index.time == datetime.time(1,0)

What is the best way to proceed to create the groups? Thanks!

CodePudding user response：

`Resample` with `offset`

The offset parameter is the Timedelta that can be added to the origin to create custom grouping intervals

df.resample('D', offset=pd.Timedelta(hours=1)).sum()

                      var
2020-12-31 01:00:00    72
2021-01-01 01:00:00  1278
2021-01-02 01:00:00  1301
2021-01-03 01:00:00  1291
2021-01-04 01:00:00  1346
....

CodePudding user response：

You can use groupby.sum if you subtract 1H from the grouper:

df.groupby((df.index - pd.Timedelta('1H')).date).sum()

#              var
# 2020-12-31    58
# 2021-01-01  1319
# 2021-01-02  1261
# ...          ...
# 2021-02-26  1286
# 2021-02-27  1246
# 2021-02-28  1240

Resample with offset

`Resample` with `offset`