I'm trying to make a simple analysis of my sport activities where I have elapsed time in the string format like this:
00:22:05
00:30:34
00:30:31
00:37:19
00:28:43
00:22:08
I've tried to convert it to the pandas datetime type but I'm only interested in time of my activities so I could calculate mean for instance or how much I was pausing during whole run.
I've tried that code but it doesn't resolve my issue.
df_test['Elapsed time'] = pd.to_datetime(df_test['Elapsed time'], format = '%H:%M:%S')
Any ideas how I can make that work? I've been trying to find answers but nothing helps. And I'm still new to Pandas. Thanks in advance.
CodePudding user response:
Welcome to StackOverflow. I think the question you are looking to answer is how to convert the time string to a datetime format without the date portion. Doing so requires only a minor modification to your code.
pd.to_datetime(df['Elapsed Time'], format = '%H:%M:%S').dt.time
Complete code:
import pandas as pd
data_dict = { 'Elapsed Time': ['00:22:05', '00:30:34', '00:30:31', '00:37:19', '00:28:43', '00:22:08'] }
df = pd.DataFrame.from_dict(data_dict)
df['Formatted Time'] = pd.to_datetime(df['Elapsed Time'], format = '%H:%M:%S').dt.time
type(df['Elapsed Time'][0]) # 'str'
type(df['Formatted Time'][0]) # 'datetime.time'
Computing with Time
In order to perform analysis of the data you'll need to convert the time value to something useful, such as seconds. Here I'll present two methods of doing that.
The first method performs manual calculations using the original time string.
def total_seconds_in_time_string(time_string):
segments = time_string.strip().split(':')
# segments: [ 'HH', 'MM', 'SS' ]
# total seconds = (((HH * 60) MM) * 60) SS
return (((int(segments[0]) * 60) int(segments[1])) * 60) int(segments[2])
df['Total Seconds'] = df['Elapsed Time'].apply(lambda x: total_seconds_in_time_string(x))
type(df['Total Seconds'][0]) # 'numpy.int64'
df['Total Seconds'].mean() # 1713.3333333333333
def seconds_to_timestring(secs):
import time
time_secs = time.gmtime(round(secs))
return time.strftime('%H:%M:%S', time_secs)
avg_time_str = seconds_to_timestring(df['Total Seconds'].mean())
print(avg_time_str) # '00:28:33'
The second method would be the more Pythonic solution using the datetime
library.
def total_seconds_in_time(t):
from datetime import timedelta
return timedelta(hours=t.hour, minutes=t.minute, seconds=t.second) / timedelta(seconds=1)
df['TimeDelta Seconds'] = df['Formatted Time'].apply(lambda x: total_seconds_in_time(x))
type(df['TimeDelta Seconds'][0]) # 'numpy.float64'
df['TimeDelta Seconds'].mean() # 1713.3333333333333
def seconds_to_timedelta(secs):
from datetime import timedelta
return timedelta(seconds=round(secs))
mean_avg = seconds_to_timedelta(df['TimeDelta Seconds'].mean())
print(mean_avg) # '0:28:33'