I have the following toy Pandas dataframe named df
:
df = pd.DataFrame({'begin' : ['08:00', '10:00', '14:00'],
'end' : ['14:00', '17:00', '22:00']})
begin end
08:00 14:00
10:00 17:00
14:00 22:00
I would like to calculate the difference between the begin
and end
times (in hours). Here is my attempt:
df['begin'] = pd.to_datetime(df['begin'])
df['end'] = pd.to_datetime(df['end'])
df['diff_hours'] = (df['end'] - df['begin']) / pd.Timedelta(hours=1)
This yields:
begin end diff_hours
0 2021-12-28 08:00:00 2021-12-28 14:00:00 6
1 2021-12-28 10:00:00 2021-12-28 17:00:00 7
2 2021-12-28 14:00:00 2021-12-28 22:00:00 8
This approach takes the current date. I would like to strip the current date from the output above, such that the df
becomes:
begin end diff_hours
0 08:00:00 14:00:00 6
1 10:00:00 17:00:00 7
2 14:00:00 22:00:00 8
Thanks in advance for any assistance you can give!
CodePudding user response:
Coerce the time to datetime, substract and convert outcome to hours
df['diff_hours']=(pd.to_datetime(df['end'], format="%H:%M")-pd.to_datetime(df['begin'], format="%H:%M")).astype('timedelta64[m]')/60
begin end diff_hours
0 08:00 14:00 6.0
1 10:00 17:00 7.0
2 14:00 22:00 8.0
CodePudding user response:
You can convert date time to time like this:
df['begin'] = df['begin'].dt.time
df['end'] = df['end'].dt.time
begin end diff_hours
0 08:00:00 14:00:00 6.0
1 10:00:00 17:00:00 7.0
2 14:00:00 22:00:00 8.0