Home > other >  Convert string duration column to time in hours & minutes
Convert string duration column to time in hours & minutes

Time:09-06

I have a data frame which contain two columns: time column df["Start_Time"] in the format of %H:%M:%S & a duration column df["Duration"] which shows the duration that was spent doing a specific activity. This is a sample of the data:

    Start_Time  Duration
0   07:30:00    03:00
1   07:30:00    06:00
2   07:30:00    07:00
3   07:30:00    28:00
4   08:50:00    50:00
5   11:30:00    32:00
6   10:30:00    24:00
7   06:45:00    15:55

My end goal is to sum the duration to the start time, figure out the end_time & how many days we passed doing this activity. The problem is I don't know how to convert the column Duration into time column. Duration is a string column. I know I can't use pd.to_datetime(). I tried using strftime but I had errors. Any ideas on how to convert the column duration ?

CodePudding user response:

You can use pd.to_datetime(format=...) and pd.to_timedelta() to do this. In your example you can do the following. We first need to add '00:' add the beginning of your column Duration as this will represent the hours (now it can be used in pd.to_timedelta()).

df['delta'] = '00:'   df['Duration']
df['start'] = pd.to_datetime(df.Start_Time, format='%H:%M:%S').dt.time
df['delta'] = pd.to_timedelta(df.delta)

Now you can add columns start and delta and convert it to time:

df['sum'] = (df['start']   df['delta']).dt.time
df = df.drop(columns=['start', 'delta'])

Output:

>>> df
  Start_Time Duration       sum
0   07:30:00    03:00  07:33:00
1   07:30:00    06:00  07:36:00
2   07:30:00    07:00  07:37:00
3   07:30:00    28:00  07:58:00
4   08:50:00    50:00  09:40:00
5   11:30:00    32:00  12:02:00
6   10:30:00    24:00  10:54:00
7   06:45:00    15:55  07:00:55

CodePudding user response:

From your dataset, first we convert the column Start_Time to datetime :

df['Start_Time'] = pd.to_datetime(df['Start_Time'], format='%H:%M:%S')

Then, we can convert the Duration column to seconds :

df['Duration_second'] = pd.to_datetime(df['Duration'], format='%M:%S').dt.minute * 60   pd.to_datetime(df['Duration'], format='%M:%S').dt.second

We compute the addition with to_timedelta :

df['Result'] = df['Start_Time']   pd.to_timedelta(df['Duration_second'], unit='s')

To finish, we keep only the %H:%M:%S format using the strftime method to get the expected result :

df['Start_Time'] = df['Start_Time'].dt.strftime('%H:%M:%S')
df['Result'] = df['Result'].dt.strftime('%H:%M:%S')

Output

>>> df
    Start_Time  Duration    Duration_second Result
0   07:30:00    03:00       180             07:33:00
1   07:30:00    06:00       360             07:36:00
2   07:30:00    07:00       420             07:37:00
3   07:30:00    28:00       1680            07:58:00
4   08:50:00    50:00       3000            09:40:00
5   11:30:00    32:00       1920            12:02:00
6   10:30:00    24:00       1440            10:54:00
7   06:45:00    15:55       955             07:00:55

We can drop the Duration_second column using drop :

df = df.drop('Duration_second', axis=1)
  • Related