Home > Blockchain >  Time series resample seems to result in wrong data
Time series resample seems to result in wrong data

Time:01-11

I have data with 30 minutes interval. When I resample it to 1 hour I get kind of low values.

Original data:

2022-12-31 22:00:00 01:00;7.500000
2022-12-31 22:30:00 01:00;8.200000
2022-12-31 23:00:00 01:00;10.800000
2022-12-31 23:30:00 01:00;9.500000
2023-01-01 00:00:00 01:00;12.300000
2023-01-01 00:30:00 01:00;168.399994
2023-01-01 01:00:00 01:00;157.399994
2023-01-01 01:30:00 01:00;73.199997
2023-01-01 02:00:00 01:00;59.700001
2023-01-01 02:30:00 01:00;74.000000

After df = df.resample('h', label='right')mean() I get:

2022-12-31 23:00:00 01:00;7.850000
2023-01-01 00:00:00 01:00;10.150000
2023-01-01 01:00:00 01:00;90.349997
2023-01-01 02:00:00 01:00;15.299995
2023-01-01 03:00:00 01:00;66.850000

Should the value for 01:00:00 not be 162.89?

CodePudding user response:

I think you are confusing label and closed parameters. If you want to get 162.89, you have to use closed='right':

>>> df.resample('H', closed='right').mean()
2022-12-31 21:00:00 01:00      7.500000
2022-12-31 22:00:00 01:00      9.500000
2022-12-31 23:00:00 01:00     10.900000
2023-01-01 00:00:00 01:00    162.899994  # right value but for 00:00
2023-01-01 01:00:00 01:00     66.449999
2023-01-01 02:00:00 01:00     74.000000
Freq: H, dtype: float64

>>> df.resample('H', closed='right', label='right').mean()
2022-12-31 22:00:00 01:00      7.500000
2022-12-31 23:00:00 01:00      9.500000
2023-01-01 00:00:00 01:00     10.900000
2023-01-01 01:00:00 01:00    162.899994  # right value for 01:00
2023-01-01 02:00:00 01:00     66.449999
2023-01-01 03:00:00 01:00     74.000000
Freq: H, dtype: float64

label control the display (index) while closed control the values.

  • Related