I have a DataFrame with an DataTimeIndex and I try to find the maximum between the first timestamp after 00:00 and 00:00 of the upcoming day. Right now I have a workin solution using pd.groupby()
with an index by day.
Here is a minimal example with a maximum value of 24 which moves on hour each day:
df = pd.DataFrame(
{'v':list(range(25))*3},
index=pd.date_range('2020-01-01', freq='1H', periods=25*3)
)
df.index.name = 'datetime'
gg = df.groupby(df.index.copy().tz_localize(None).to_period("D"))
m = pd.merge(gg.idxmax(), gg.max(), on="datetime")
m.columns = ["idxmax", "max"]
>>> m
idxmax max
datetime
2020-01-01 2020-01-01 23:00:00 23
2020-01-02 2020-01-02 00:00:00 24
2020-01-03 2020-01-03 01:00:00 24
2020-01-04 2020-01-04 02:00:00 24
What I am looking for is
idxmax max
datetime
2019-12-31 2020-01-01 00:00:00 0
2020-01-01 2020-01-02 00:00:00 24
2020-01-02 2020-01-03 00:00:00 23
2020-01-03 2020-01-03 01:00:00 24
2020-01-04 2020-01-04 02:00:00 24
How can I create the wanted results?
CodePudding user response:
Use Grouper
with offset
parameter:
df = pd.DataFrame(
{'v':list(range(25))*3},
index=pd.date_range('2020-01-01', freq='1H', periods=25*3)
)
df.index.name = 'datetime'
#if necessary
df.index = df.index.tz_localize(None)
gg = df.groupby(pd.Grouper(freq='D', offset='1H'))
m = pd.merge(gg.idxmax(), gg.max(), on="datetime")
m.columns = ["idxmax", "max"]
print (m)
idxmax max
datetime
2019-12-31 01:00:00 2020-01-01 00:00:00 0
2020-01-01 01:00:00 2020-01-02 00:00:00 24
2020-01-02 01:00:00 2020-01-03 00:00:00 23
2020-01-03 01:00:00 2020-01-03 01:00:00 24
2020-01-04 01:00:00 2020-01-04 02:00:00 24
CodePudding user response:
Ok. I found a solution with an pd.Timdelta
.
df = pd.DataFrame({'v':list(range(25))*3}, index=pd.date_range('2020-01-01', freq='1H', periods=25*3))
df.index.name = 'datetime'
df.index -= pd.Timedelta('1h')
gg = df.groupby(df.index.copy().tz_localize(None).to_period("D"))
m = pd.merge(gg.idxmax() pd.Timedelta('1h'), gg.max(), on="datetime")
m.columns = ["idxmax", "max"]
>>> idxmax max
datetime
2019-12-31 2020-01-01 00:00:00 0
2020-01-01 2020-01-02 00:00:00 24
2020-01-02 2020-01-03 00:00:00 23
2020-01-03 2020-01-03 01:00:00 24
2020-01-04 2020-01-04 02:00:00 24
CodePudding user response:
What about:
df = pd.DataFrame({'v':list(range(25))*3}, index=pd.date_range('2020-01-01', freq='1H', periods=25*3))
df.index.name = 'datetime'
df.index -= pd.Timedelta('1h')
df = pd.merge_asof( df, df.resample('D')['v'].max().rename('v_max'), left_index=True, right_index=True)
df.index = pd.Timedelta('1h')
df