I have a Pandas dataframe that looks like this :
# date
--- -------------------
0 2022-01-01 08:00:00
1 2022-01-01 08:01:00
2 2022-01-01 08:52:00
My goal is to add a new column that contains a datetime object with the value of the next hour. I looked at the documentation of the ceil function, and it works pretty well in most cases.
Issue
The problem concerns hours that are perfectly round (like the one at #0) :
df["next"] = (df["date"]).dt.ceil("H")
# date next
--- ------------------- -------------------
0 2022-01-01 08:00:00 2022-01-01 08:00:00 <--- wrong, expected 09:00:00
1 2022-01-01 08:01:00 2022-01-01 09:00:00 <--- correct
2 2022-01-01 08:52:00 2022-01-01 09:00:00 <--- correct
Sub-optimal solution
I have come up with the following workaround, but I find it really clumsy :
def nextHour(current):
return pd.date_range(start=current, periods=2, freq="H")[1]
df["next"] = (df["date"]).apply(lambda x: nextHour(x))
I have around 1-2 million rows in my dataset and I find this solution extremely slow compared to the native dt.ceil()
. Is there a better way of doing it ?
CodePudding user response:
This is the way ceil works, it won't jump to the next hour.
What you want seems more like a floor
1h using pandas.Timedelta
:
df['next'] = df['date'].dt.floor('H') pd.Timedelta('1h')
output:
date next
0 2022-01-01 08:00:00 2022-01-01 09:00:00
1 2022-01-01 08:01:00 2022-01-01 09:00:00
2 2022-01-01 08:52:00 2022-01-01 09:00:00
difference of bounds behavior between floor
and ceil
:
date ceil floor
0 2022-01-01 08:00:00 2022-01-01 08:00:00 2022-01-01 08:00:00
1 2022-01-01 08:01:00 2022-01-01 09:00:00 2022-01-01 08:00:00
2 2022-01-01 08:52:00 2022-01-01 09:00:00 2022-01-01 08:00:00
3 2022-01-01 09:00:00 2022-01-01 09:00:00 2022-01-01 09:00:00
4 2022-01-01 09:01:00 2022-01-01 10:00:00 2022-01-01 09:00:00