Home > Back-end >  Measuring elapsed time in Pandas
Measuring elapsed time in Pandas

Time:10-06

I'm trying to make a simple analysis of my sport activities where I have elapsed time in the string format like this:

00:22:05
00:30:34
00:30:31
00:37:19
00:28:43
00:22:08

I've tried to convert it to the pandas datetime type but I'm only interested in time of my activities so I could calculate mean for instance or how much I was pausing during whole run.

I've tried that code but it doesn't resolve my issue.

df_test['Elapsed time'] = pd.to_datetime(df_test['Elapsed time'], format = '%H:%M:%S')

Any ideas how I can make that work? I've been trying to find answers but nothing helps. And I'm still new to Pandas. Thanks in advance.

CodePudding user response:

Welcome to StackOverflow. I think the question you are looking to answer is how to convert the time string to a datetime format without the date portion. Doing so requires only a minor modification to your code.

pd.to_datetime(df['Elapsed Time'], format = '%H:%M:%S').dt.time

Complete code:

import pandas as pd

data_dict = { 'Elapsed Time': ['00:22:05', '00:30:34', '00:30:31', '00:37:19', '00:28:43', '00:22:08'] }

df = pd.DataFrame.from_dict(data_dict)

df['Formatted Time'] = pd.to_datetime(df['Elapsed Time'], format = '%H:%M:%S').dt.time

type(df['Elapsed Time'][0]) # 'str'

type(df['Formatted Time'][0]) # 'datetime.time'

Computing with Time

In order to perform analysis of the data you'll need to convert the time value to something useful, such as seconds. Here I'll present two methods of doing that.

The first method performs manual calculations using the original time string.

def total_seconds_in_time_string(time_string):
    segments = time_string.strip().split(':')
    # segments: [ 'HH', 'MM', 'SS' ]
    # total seconds = (((HH * 60)   MM) * 60)   SS
    return (((int(segments[0]) * 60)   int(segments[1])) * 60)   int(segments[2])

df['Total Seconds'] = df['Elapsed Time'].apply(lambda x: total_seconds_in_time_string(x))

type(df['Total Seconds'][0]) # 'numpy.int64'

df['Total Seconds'].mean() # 1713.3333333333333

def seconds_to_timestring(secs):
    import time
    time_secs = time.gmtime(round(secs))
    return time.strftime('%H:%M:%S', time_secs)

avg_time_str = seconds_to_timestring(df['Total Seconds'].mean())

print(avg_time_str) # '00:28:33'

The second method would be the more Pythonic solution using the datetime library.

def total_seconds_in_time(t):
    from datetime import timedelta
    return timedelta(hours=t.hour, minutes=t.minute, seconds=t.second) / timedelta(seconds=1)

df['TimeDelta Seconds'] = df['Formatted Time'].apply(lambda x: total_seconds_in_time(x))

type(df['TimeDelta Seconds'][0]) # 'numpy.float64'

df['TimeDelta Seconds'].mean() # 1713.3333333333333

def seconds_to_timedelta(secs):
    from datetime import timedelta
    return timedelta(seconds=round(secs))

mean_avg = seconds_to_timedelta(df['TimeDelta Seconds'].mean())

print(mean_avg) # '0:28:33'
  • Related