Consider the following index i
which spans 1.5 days:
import pandas as pd
from typing import Union
i = pd.date_range('2022-01-01', freq='H', periods=36)
The timestamps can be floored to the nearest day start, which gives 24 times '2022-01-01' and 12 times '2022-01-02':
i.floor("D")
# DatetimeIndex(['2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
# '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
# '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
# '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
# '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
# '2022-01-01', '2022-01-01', '2022-01-01', '2022-01-01',
# '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02',
# '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02',
# '2022-01-02', '2022-01-02', '2022-01-02', '2022-01-02'],
# dtype='datetime64[ns]', freq=None)
However, in my current use-case, I want to floor them to '06:00', to get this output:
# DatetimeIndex(['2021-12-31 06:00', '2021-12-31 06:00', '2021-12-31 06:00', '2021-12-31 06:00',
# '2021-12-31 06:00', '2021-12-31 06:00', '2022-01-01 06:00', '2022-01-01 06:00',
# '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',
# '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',
# '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',
# '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',
# '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00', '2022-01-01 06:00',
# '2021-02-01 06:00', '2022-01-01 06:00', '2022-01-02 06:00', '2022-01-02 06:00',
# '2022-01-02 06:00', '2022-01-02 06:00', '2022-01-02 06:00', '2022-01-02 06:00'],
# dtype='datetime64[ns]', freq=None)
Preferably by using the following grouper object:
gr = pd.Grouper(freq='D', offset=pd.Timedelta(hours=6))
Is there a way to do that?
Remarks:
The reason for wanting to use
gr
is that it's used elsewhere in aggregating (pd.Series.groupby
). It'd be great to re-use it here.The use-case includes flooring to '06:00' of the first day of the month, quarter, or year.
CodePudding user response:
There is not much advantage in "reusing" gr
, it's not like reusing a GroupbBy
object: g = df.groupby(...)
that can be expensive to compute.
Nevertheless, looks like you want:
gr = pd.Grouper(freq='D', offset=pd.Timedelta(hours=6))
g = pd.Series(index=i, dtype='float').groupby(gr)
idx = g.size()
out = idx.repeat(idx).index
As a one-liner (python ≥3.8):
out = (idx:=pd.Series(index=i, dtype='float').groupby(gr).size()).repeat(idx).index
output:
DatetimeIndex(['2021-12-31 06:00:00', '2021-12-31 06:00:00',
'2021-12-31 06:00:00', '2021-12-31 06:00:00',
'2021-12-31 06:00:00', '2021-12-31 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-01 06:00:00', '2022-01-01 06:00:00',
'2022-01-02 06:00:00', '2022-01-02 06:00:00',
'2022-01-02 06:00:00', '2022-01-02 06:00:00',
'2022-01-02 06:00:00', '2022-01-02 06:00:00'],
dtype='datetime64[ns]', freq=None)