I have a data frame which contain two columns: time column df["Start_Time"]
in the format of %H:%M:%S
& a duration column df["Duration"]
which shows the duration that was spent doing a specific activity.
This is a sample of the data:
Start_Time Duration
0 07:30:00 03:00
1 07:30:00 06:00
2 07:30:00 07:00
3 07:30:00 28:00
4 08:50:00 50:00
5 11:30:00 32:00
6 10:30:00 24:00
7 06:45:00 15:55
My end goal is to sum the duration
to the start time
, figure out the end_time & how many days we passed doing this activity.
The problem is I don't know how to convert the column Duration into time column. Duration is a string column. I know I can't use pd.to_datetime()
.
I tried using strftime
but I had errors. Any ideas on how to convert the column duration ?
CodePudding user response:
You can use pd.to_datetime(format=...)
and pd.to_timedelta()
to do this. In your example you can do the following. We first need to add '00:'
add the beginning of your column Duration
as this will represent the hours (now it can be used in pd.to_timedelta()
).
df['delta'] = '00:' df['Duration']
df['start'] = pd.to_datetime(df.Start_Time, format='%H:%M:%S').dt.time
df['delta'] = pd.to_timedelta(df.delta)
Now you can add columns start
and delta
and convert it to time:
df['sum'] = (df['start'] df['delta']).dt.time
df = df.drop(columns=['start', 'delta'])
Output:
>>> df
Start_Time Duration sum
0 07:30:00 03:00 07:33:00
1 07:30:00 06:00 07:36:00
2 07:30:00 07:00 07:37:00
3 07:30:00 28:00 07:58:00
4 08:50:00 50:00 09:40:00
5 11:30:00 32:00 12:02:00
6 10:30:00 24:00 10:54:00
7 06:45:00 15:55 07:00:55
CodePudding user response:
From your dataset, first we convert the column Start_Time
to datetime
:
df['Start_Time'] = pd.to_datetime(df['Start_Time'], format='%H:%M:%S')
Then, we can convert the Duration
column to seconds :
df['Duration_second'] = pd.to_datetime(df['Duration'], format='%M:%S').dt.minute * 60 pd.to_datetime(df['Duration'], format='%M:%S').dt.second
We compute the addition with to_timedelta
:
df['Result'] = df['Start_Time'] pd.to_timedelta(df['Duration_second'], unit='s')
To finish, we keep only the %H:%M:%S
format using the strftime
method to get the expected result :
df['Start_Time'] = df['Start_Time'].dt.strftime('%H:%M:%S')
df['Result'] = df['Result'].dt.strftime('%H:%M:%S')
Output
>>> df
Start_Time Duration Duration_second Result
0 07:30:00 03:00 180 07:33:00
1 07:30:00 06:00 360 07:36:00
2 07:30:00 07:00 420 07:37:00
3 07:30:00 28:00 1680 07:58:00
4 08:50:00 50:00 3000 09:40:00
5 11:30:00 32:00 1920 12:02:00
6 10:30:00 24:00 1440 10:54:00
7 06:45:00 15:55 955 07:00:55
We can drop the Duration_second
column using drop
:
df = df.drop('Duration_second', axis=1)