I am trying to make a column with time duration (hours - first one would be 1.424 hrs for example). My data looks like this.
Clock_Start and Clock_Stop are objects.
I have tried the following:
df['Duration'] = (df_nyc['Clock_Stop'] - df['Clock_Start']).astype('timedelta64[m]').astype(int)
... But I get the following error: TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
Can anybody tell me what I am doing wrong here? Any help would be muchly appreciated :)
CodePudding user response:
Time alone cannot be added or subtracted. You need to convert to Timedelta
. There is a stop time that is before the start time, which I assume is because it goes past midnight. In that case, add 24 hours to the result:
h = (pd.to_timedelta(df["Clock_Stop"]) - pd.to_timedelta(df["Clock_Start"])) / pd.Timedelta(hours=1)
h = h.mask(h < 0, h 24)
CodePudding user response:
you have something like
from datetime import time
import pandas as pd
df = pd.DataFrame({"Clock_Start": [time(20,52), time(23,19,55), time(12,17,30)],
"Clock_Stop": [time(22,17,27), time(23,41,46), time(3,44)]})
df
Clock_Start Clock_Stop
0 20:52:00 22:17:27
1 23:19:55 23:41:46
2 12:17:30 03:44:00
Note that there is an ambiguity here: row 3, is 03:44:00
on the next day? Or maybe two days later? Or a day before?! That's why you can't just add or subtract time. This is true in general, not just if t0 > t1.
But you can remove the ambiguity by using the timedelta
type and specify for instance that if the difference is negative, assume one day apart, else same day. EX:
# convert datetime.time -> pandas.Timedelta
df["Clock_Start"] = pd.to_timedelta(df["Clock_Start"].astype(str))
df["Clock_Stop"] = pd.to_timedelta(df["Clock_Stop"].astype(str))
df["Duration"] = df["Clock_Stop"] - df["Clock_Start"]
# df["Duration"]
# 0 0 days 01:25:27
# 1 0 days 00:21:51
# 2 -1 days 15:26:30
# Name: Duration, dtype: timedelta64[ns]
df.loc[df["Duration"]<pd.Timedelta(0)] = pd.Timedelta(days=1)
# df["Duration"]
# 0 0 days 01:25:27
# 1 0 days 00:21:51
# 2 0 days 15:26:30
# Name: Duration, dtype: timedelta64[ns]
Now you can get other representations of the duration, e.g. fractional hours like
df["Duration"].dt.total_seconds()/3600
0 1.424167
1 0.364167
2 15.441667
Name: Duration, dtype: float64