I have data with 30 minutes interval. When I resample it to 1 hour I get kind of low values.
Original data:
2022-12-31 22:00:00 01:00;7.500000
2022-12-31 22:30:00 01:00;8.200000
2022-12-31 23:00:00 01:00;10.800000
2022-12-31 23:30:00 01:00;9.500000
2023-01-01 00:00:00 01:00;12.300000
2023-01-01 00:30:00 01:00;168.399994
2023-01-01 01:00:00 01:00;157.399994
2023-01-01 01:30:00 01:00;73.199997
2023-01-01 02:00:00 01:00;59.700001
2023-01-01 02:30:00 01:00;74.000000
After df = df.resample('h', label='right')mean()
I get:
2022-12-31 23:00:00 01:00;7.850000
2023-01-01 00:00:00 01:00;10.150000
2023-01-01 01:00:00 01:00;90.349997
2023-01-01 02:00:00 01:00;15.299995
2023-01-01 03:00:00 01:00;66.850000
Should the value for 01:00:00 not be 162.89
?
CodePudding user response:
I think you are confusing label
and closed
parameters. If you want to get 162.89
, you have to use closed='right'
:
>>> df.resample('H', closed='right').mean()
2022-12-31 21:00:00 01:00 7.500000
2022-12-31 22:00:00 01:00 9.500000
2022-12-31 23:00:00 01:00 10.900000
2023-01-01 00:00:00 01:00 162.899994 # right value but for 00:00
2023-01-01 01:00:00 01:00 66.449999
2023-01-01 02:00:00 01:00 74.000000
Freq: H, dtype: float64
>>> df.resample('H', closed='right', label='right').mean()
2022-12-31 22:00:00 01:00 7.500000
2022-12-31 23:00:00 01:00 9.500000
2023-01-01 00:00:00 01:00 10.900000
2023-01-01 01:00:00 01:00 162.899994 # right value for 01:00
2023-01-01 02:00:00 01:00 66.449999
2023-01-01 03:00:00 01:00 74.000000
Freq: H, dtype: float64
label
control the display (index) while closed
control the values.