Home > Software design >  How to get difference in hours between two timestamp columns in pandas data-frame
How to get difference in hours between two timestamp columns in pandas data-frame

Time:09-22

I am trying to get time difference between start and end timestamp data given in '%H:%M:%S.%f' format but getting value error

Data-
|Start time|End time |
|----------|---------|
|02:48:00  | 03:03:00|
|02:48:00  | 03:03:00|
|22:21:00  | 23:40:00|
|22:21:00  | 23:40:00|
|01:30:00  | 02:54:00|
|09:10:00  | 10:13:00|
|05:31:00  | 06:28:00|
|23:09:00  | -1|
|16:09:00  | 17:29:00|

code I am trying

df3['timespent'] = pd.to_datetime(str(df3['End time'])) - pd.to_datetime(str(df3['Start Time']))

Error-

ValueError: ('Unknown string format:', '0     03:03:00\n1     03:03:00\n2     23:40:00\n3     23:40:00\n4     02:54:00\n        ...   \n86    21:13:00\n87    14:24:00\n88    19:57:00\n89    22:23:00\n90    10:53:00\nName: End time, Length: 91, dtype: object')*

CodePudding user response:

You probably just need to strip out the spaces if the values are already in a DataFrame -

pd.to_datetime(df3['End time'].str.strip()) - pd.to_datetime(df3['Start Time'].str.strip()) 

If the format is not automatically recognized - you can pass in a format string to the to_datetime function

CodePudding user response:

First of all, seeing your data, I would recommend to apply pd.to_timedelta. These can be used to calculate the Timedelta between the start and end and then finallly access the seconds field of the time.

Having said that, your string will probably need some stripping .strip() in order to get the format shown below in the code.

import pandas as pd
from datetime import timedelta

df = pd.DataFrame({
    'start' : ["20:15:12", "08:08:08", "23:59:58"], 
    'end' : ["20:25:52", "11:11:11", "00:01:01"]
})


# convert 'start' and 'end' to timedeltas and then substract start from end
df['diff'] = df.apply(pd.to_timedelta).apply(lambda x: x.end - x.start,axis=1)
# correct those that cross midnight
correct24h = timedelta(hours=24)
df['diff'] = df['diff'].apply(lambda x: x   correct24h if x.delta < 0 else x)

# calculate hours from seconds
df['diff_hours'] = df['diff'].dt.seconds / 60 / 60

df
      start       end            diff  diff_hours
0  20:15:12  20:25:52 0 days 00:10:40    0.177778
1  08:08:08  11:11:11 0 days 03:03:03    3.050833
2  23:59:58  00:01:01 0 days 00:01:03    0.017500
  • Related