Home > OS >  Floor `pandas.DatetimeIndex` using `grouper`
Floor `pandas.DatetimeIndex` using `grouper`

Time:10-31

Consider the following index i which spans 1.5 days:

import pandas as pd
from typing import Union
i = pd.date_range('2022-01-01', freq='H', periods=36)

The timestamps can be floored to the nearest day start, which gives 24 times '2022-01-01' and 12 times '2022-01-02':

i.floor("D")
# DatetimeIndex(['2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
#                '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02',
#                '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02',
#                '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02'],
#               dtype='datetime64[ns]', freq=None)

However, in my current use-case, I want to floor them to '06:00', to get this output:

# DatetimeIndex(['2021-12-31 06:00', '2021-12-31 06:00', '2021-12-31 06:00', '2021-12-31 06:00',
#                '2021-12-31 06:00', '2021-12-31 06:00', '2022-01-01 06:00', '2022-01-01 06:00',
#                '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',               
#                '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',               
#                '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',               
#                '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',               
#                '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',               
#                '2021-02-01 06:00', '2022-01-01 06:00', '2022-01-02 06:00', '2022-01-02 06:00', 
#                '2022-01-02 06:00', '2022-01-02 06:00', '2022-01-02 06:00', '2022-01-02 06:00'],
#               dtype='datetime64[ns]', freq=None)

Preferably by using the following grouper object:

gr = pd.Grouper(freq='D', offset=pd.Timedelta(hours=6))

Is there a way to do that?

Remarks:

  • The reason for wanting to use gr is that it's used elsewhere in aggregating (pd.Series.groupby). It'd be great to re-use it here.

  • The use-case includes flooring to '06:00' of the first day of the month, quarter, or year.

CodePudding user response:

There is not much advantage in "reusing" gr, it's not like reusing a GroupbBy object: g = df.groupby(...) that can be expensive to compute.

Nevertheless, looks like you want:

gr = pd.Grouper(freq='D', offset=pd.Timedelta(hours=6))

g = pd.Series(index=i, dtype='float').groupby(gr)

idx = g.size()

out = idx.repeat(idx).index

As a one-liner (python ≥3.8):

out = (idx:=pd.Series(index=i, dtype='float').groupby(gr).size()).repeat(idx).index

output:

DatetimeIndex(['2021-12-31 06:00:00', '2021-12-31 06:00:00',
               '2021-12-31 06:00:00', '2021-12-31 06:00:00',
               '2021-12-31 06:00:00', '2021-12-31 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-01 06:00:00', '2022-01-01 06:00:00',
               '2022-01-02 06:00:00', '2022-01-02 06:00:00',
               '2022-01-02 06:00:00', '2022-01-02 06:00:00',
               '2022-01-02 06:00:00', '2022-01-02 06:00:00'],
              dtype='datetime64[ns]', freq=None)
  • Related