If I ask pandas to round a datetime to the nearest hour, there is not consistent rounding to the nearest hour from the halfway point. Odd hours like 5 have their half-hour (5:30) rounded up, but even hours have their half hour rounded down. E.g. 5:30 and 6:30 are both rounded to 6.
Why? And how do I get it to do something consistent like round every half-hour up?
>df['nearest_hour']=df['datetime'].dt.round('H')
>df
datetime nearest_hour
141470 2019-10-26 04:45:00 00:00 2019-10-26 05:00:00 00:00
294282 2019-10-26 05:00:00 00:00 2019-10-26 05:00:00 00:00
178460 2019-10-26 05:15:00 00:00 2019-10-26 05:00:00 00:00
92696 2019-10-26 05:30:00 00:00 2019-10-26 06:00:00 00:00
227946 2019-10-26 05:45:00 00:00 2019-10-26 06:00:00 00:00
216924 2019-10-26 06:00:00 00:00 2019-10-26 06:00:00 00:00
167823 2019-10-26 06:15:00 00:00 2019-10-26 06:00:00 00:00
240039 2019-10-26 06:30:00 00:00 2019-10-26 06:00:00 00:00
CodePudding user response:
quoting from this issue on the pandas github:
This the expected behavior, albeit unintuitive the first time you encounter it, and is consistent with the rest of the Python ecosystem. The rule for rounding is that exact halfway cases are rounded to the nearest even result.
See also Python 3.x rounding behavior:
for i in [1.5, 2.5, 3.5, 4.5]:
print(i, round(i))
1.5 2
2.5 2
3.5 4
4.5 4
If you want to achieve your expected rounding behavior nevertheless, you can conditionally floor
and ceil
to the hour:
df['nearest_hour'] = df["datetime"].dt.ceil("H").where(df["datetime"].dt.minute >= 30, df["datetime"].dt.floor("H"))
df
datetime nearest_hour
0 2019-10-26 04:45:00 00:00 2019-10-26 05:00:00 00:00
1 2019-10-26 05:00:00 00:00 2019-10-26 05:00:00 00:00
2 2019-10-26 05:15:00 00:00 2019-10-26 05:00:00 00:00
3 2019-10-26 05:30:00 00:00 2019-10-26 06:00:00 00:00
4 2019-10-26 05:45:00 00:00 2019-10-26 06:00:00 00:00
5 2019-10-26 06:00:00 00:00 2019-10-26 06:00:00 00:00
6 2019-10-26 06:15:00 00:00 2019-10-26 06:00:00 00:00
7 2019-10-26 06:30:00 00:00 2019-10-26 07:00:00 00:00