Home > Software design >  How to create a Duration column in Python from Start and Stop times?
How to create a Duration column in Python from Start and Stop times?

Time:10-24

I am trying to make a column with time duration (hours - first one would be 1.424 hrs for example). My data looks like this.

enter image description here

Clock_Start and Clock_Stop are objects.

enter image description here

I have tried the following:

df['Duration'] = (df_nyc['Clock_Stop'] - df['Clock_Start']).astype('timedelta64[m]').astype(int)

... But I get the following error: TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

Can anybody tell me what I am doing wrong here? Any help would be muchly appreciated :)

CodePudding user response:

Time alone cannot be added or subtracted. You need to convert to Timedelta. There is a stop time that is before the start time, which I assume is because it goes past midnight. In that case, add 24 hours to the result:

h = (pd.to_timedelta(df["Clock_Stop"]) - pd.to_timedelta(df["Clock_Start"])) / pd.Timedelta(hours=1)
h = h.mask(h < 0, h   24)

CodePudding user response:

you have something like

from datetime import time
import pandas as pd

df = pd.DataFrame({"Clock_Start": [time(20,52), time(23,19,55), time(12,17,30)],
                   "Clock_Stop": [time(22,17,27), time(23,41,46), time(3,44)]})

df
  Clock_Start Clock_Stop
0    20:52:00   22:17:27
1    23:19:55   23:41:46
2    12:17:30   03:44:00

Note that there is an ambiguity here: row 3, is 03:44:00 on the next day? Or maybe two days later? Or a day before?! That's why you can't just add or subtract time. This is true in general, not just if t0 > t1.

But you can remove the ambiguity by using the timedelta type and specify for instance that if the difference is negative, assume one day apart, else same day. EX:

# convert datetime.time -> pandas.Timedelta
df["Clock_Start"] = pd.to_timedelta(df["Clock_Start"].astype(str))
df["Clock_Stop"] = pd.to_timedelta(df["Clock_Stop"].astype(str))

df["Duration"] = df["Clock_Stop"] - df["Clock_Start"]

# df["Duration"]
# 0     0 days 01:25:27
# 1     0 days 00:21:51
# 2   -1 days  15:26:30
# Name: Duration, dtype: timedelta64[ns]

df.loc[df["Duration"]<pd.Timedelta(0)]  = pd.Timedelta(days=1)

# df["Duration"]
# 0   0 days 01:25:27
# 1   0 days 00:21:51
# 2   0 days 15:26:30
# Name: Duration, dtype: timedelta64[ns]

Now you can get other representations of the duration, e.g. fractional hours like

df["Duration"].dt.total_seconds()/3600
0     1.424167
1     0.364167
2    15.441667
Name: Duration, dtype: float64
  • Related