I am using resample on a pandas dataframe with a datetime index and the resultant dataframe is returning an unexpected new datetime values.
The original dataframe:
valid snow
2022-06-01 19:00:00 00:00 NaN
2022-06-01 20:00:00 00:00 1.0
2022-06-01 21:00:00 00:00 2.0
2022-06-01 22:00:00 00:00 3.0
2022-06-01 23:00:00 00:00 4.0
2022-06-02 00:00:00 00:00 5.0
2022-06-02 01:00:00 00:00 6.0
2022-06-02 02:00:00 00:00 7.0
And I am applying the following pandas function
df.resample('3H').apply(np.max)
This is returning
valid snow
2022-06-01 18:00:00 00:00 1.0
2022-06-01 21:00:00 00:00 4.0
2022-06-02 00:00:00 00:00 7.0
The first time should not be T18, should be T19. Not sure as to why this is happening. Adding key from resample do not appease this issue. Also adding .dropna before the resample does not fix this issue either.
Additionally, when iterating through the groups using
[{group[0]: group[1]} for group in df.resample('3H')]
the 0th group in this list is
{Timestamp('2022-06-01 18:00:00 0000', tz='UTC', freq='3H'): snow
valid
2022-06-01 19:00:00 00:00 NaN
2022-06-01 20:00:00 00:00 1.0}
The group contains one less value that I would expect using resample, and also the key for this group is not what I would expect either.
CodePudding user response:
If need starting by first index add origin='start'
parameter to DataFrame.resample
:
df = df.resample('3H', origin='start').max()
print (df)
snow
valid
2022-06-01 19:00:00 00:00 2.0
2022-06-01 22:00:00 00:00 5.0
2022-06-02 01:00:00 00:00 7.0