Home > other >  Pandas dataframe resample returning incorrect timestaps
Pandas dataframe resample returning incorrect timestaps

Time:08-09

I am using resample on a pandas dataframe with a datetime index and the resultant dataframe is returning an unexpected new datetime values.

The original dataframe:

valid                      snow
2022-06-01 19:00:00 00:00   NaN
2022-06-01 20:00:00 00:00   1.0
2022-06-01 21:00:00 00:00   2.0
2022-06-01 22:00:00 00:00   3.0
2022-06-01 23:00:00 00:00   4.0
2022-06-02 00:00:00 00:00   5.0
2022-06-02 01:00:00 00:00   6.0
2022-06-02 02:00:00 00:00   7.0

And I am applying the following pandas function

df.resample('3H').apply(np.max)

This is returning

valid                      snow
2022-06-01 18:00:00 00:00   1.0
2022-06-01 21:00:00 00:00   4.0
2022-06-02 00:00:00 00:00   7.0

The first time should not be T18, should be T19. Not sure as to why this is happening. Adding key from resample do not appease this issue. Also adding .dropna before the resample does not fix this issue either.

Additionally, when iterating through the groups using

[{group[0]: group[1]} for group in df.resample('3H')]

the 0th group in this list is

{Timestamp('2022-06-01 18:00:00 0000', tz='UTC', freq='3H'):                            snow
 valid                          
 2022-06-01 19:00:00 00:00   NaN
 2022-06-01 20:00:00 00:00   1.0}

The group contains one less value that I would expect using resample, and also the key for this group is not what I would expect either.

CodePudding user response:

If need starting by first index add origin='start' parameter to DataFrame.resample:

df = df.resample('3H', origin='start').max()
print (df)
                           snow
valid                          
2022-06-01 19:00:00 00:00   2.0
2022-06-01 22:00:00 00:00   5.0
2022-06-02 01:00:00 00:00   7.0
  • Related