Home > Enterprise >  How to datetime parse a non-standardized time format
How to datetime parse a non-standardized time format

Time:10-26

I would like to create datetime objects from a list of string timecodes like these. However, parse interprets incorrectly for my use case.

from datetime import datetime
from dateutil import parser

timecodes = ['0:00', '0:01', '1:01', '10:01', '1:10:01']

dt = parser.parse(timecode)
print(dt)

The list above comes from YouTube's transcript timecodes. When copied from the site, they use a variable format to designate hours, minutes, and time, based on elapsed time:

0:00     # 0 minutes, 0 seconds
0:01     # 0 minutes, 1 seconds
1:01     # 1 minutes, 1 seconds
10:01    # 10 minutes, 1 seconds
1:10:01  # 1 hours, 10 minutes, 1 seconds

and parse results in (comments are my interpretations):

2022-10-24 00:00:00    #0 minutes, 0 seconds
2022-10-24 00:01:00    #1 minutes, 0 seconds
2022-10-24 01:01:00    #1 hours, 1 minutes, 0 seconds
2022-10-24 10:01:00    #10 hours, 1 minutes, 0 seconds
2022-10-24 01:10:01    #1 hours, 10 minutes, 1 seconds

i.e. if a string doesn't consist of a full timecode including hours, minutes, seconds, then parse appears to think that minutes are hours, and seconds are minutes.

How can I either dynamically parse the list to default interpretation to minutes & seconds instead of hours & minutes, or alternatively adjust the timecodes intelligently so that they conform to the parse format?

CodePudding user response:

This is a little tricky but should work:

import datetime
timecodes = ['0:00', '0:01', '1:01', '10:01', '1:10:01']
zeroes = ['0','0','0']
dt = []
for i in timecodes:
    sep = i.split(':')
    sep = zeroes[:3-len(sep)]   sep
    dt.append(str(datetime.timedelta(seconds = sum([int(s) * 60**(2-sep.index(s)) for s in sep]))))

Output:

dt = ['0:00:00', '0:00:01', '0:01:01', '0:10:01', '1:10:01']

CodePudding user response:

another option is to map the duration components to integers in reverse order (seconds, minutes, hours), and convert to seconds by multiplication with the appropriate factors (1, 60, 3600) using zip. sum that up to get the total seconds, with you can convert to timedelta:

from datetime import timedelta

timecodes = ['0:00', '0:01', '1:01', '10:01', '1:10:01']

for t in timecodes:
    print(
        timedelta(seconds=sum(a*b for a, b in zip(map(int, t.split(":")[::-1]), (1, 60, 3600))))
        )
    
0:00:00
0:00:01
0:01:01
0:10:01
1:10:01
  • Related