How to group by days with an timeshift in pandas?-CodePudding

I have a DataFrame with an DataTimeIndex and I try to find the maximum between the first timestamp after 00:00 and 00:00 of the upcoming day. Right now I have a workin solution using pd.groupby() with an index by day.

Here is a minimal example with a maximum value of 24 which moves on hour each day:

df = pd.DataFrame(
    {'v':list(range(25))*3},
     index=pd.date_range('2020-01-01', freq='1H', periods=25*3)
)
df.index.name = 'datetime'
gg = df.groupby(df.index.copy().tz_localize(None).to_period("D"))
m = pd.merge(gg.idxmax(), gg.max(), on="datetime")
m.columns = ["idxmax", "max"]
>>> m
                        idxmax  max
datetime                           
2020-01-01 2020-01-01 23:00:00   23
2020-01-02 2020-01-02 00:00:00   24
2020-01-03 2020-01-03 01:00:00   24
2020-01-04 2020-01-04 02:00:00   24

What I am looking for is

                        idxmax  max
datetime                           
2019-12-31 2020-01-01 00:00:00   0
2020-01-01 2020-01-02 00:00:00   24
2020-01-02 2020-01-03 00:00:00   23
2020-01-03 2020-01-03 01:00:00   24
2020-01-04 2020-01-04 02:00:00   24

How can I create the wanted results?

CodePudding user response：

Use Grouper with offset parameter:

df = pd.DataFrame(
    {'v':list(range(25))*3},
     index=pd.date_range('2020-01-01', freq='1H', periods=25*3)
)
df.index.name = 'datetime'

#if necessary
df.index = df.index.tz_localize(None)
gg = df.groupby(pd.Grouper(freq='D', offset='1H'))
m = pd.merge(gg.idxmax(), gg.max(), on="datetime")
m.columns = ["idxmax", "max"]
print (m)
                                 idxmax  max
datetime                                    
2019-12-31 01:00:00 2020-01-01 00:00:00    0
2020-01-01 01:00:00 2020-01-02 00:00:00   24
2020-01-02 01:00:00 2020-01-03 00:00:00   23
2020-01-03 01:00:00 2020-01-03 01:00:00   24
2020-01-04 01:00:00 2020-01-04 02:00:00   24

CodePudding user response：

Ok. I found a solution with an pd.Timdelta.

df = pd.DataFrame({'v':list(range(25))*3}, index=pd.date_range('2020-01-01', freq='1H', periods=25*3))
df.index.name = 'datetime'
df.index -= pd.Timedelta('1h')
gg = df.groupby(df.index.copy().tz_localize(None).to_period("D"))
m = pd.merge(gg.idxmax() pd.Timedelta('1h'), gg.max(), on="datetime")
m.columns = ["idxmax", "max"]
>>>                        idxmax  max
datetime                           
2019-12-31 2020-01-01 00:00:00    0
2020-01-01 2020-01-02 00:00:00   24
2020-01-02 2020-01-03 00:00:00   23
2020-01-03 2020-01-03 01:00:00   24
2020-01-04 2020-01-04 02:00:00   24

CodePudding user response：

What about:

df = pd.DataFrame({'v':list(range(25))*3}, index=pd.date_range('2020-01-01', freq='1H', periods=25*3))
df.index.name = 'datetime'
df.index -= pd.Timedelta('1h')
df = pd.merge_asof( df, df.resample('D')['v'].max().rename('v_max'), left_index=True, right_index=True)
df.index  = pd.Timedelta('1h')
df