YouTube video durations are given in the form PT1H15M9S
.
I need to parse this string to convert the duration to seconds.
I tried the following code:
import re
def to_seconds(string=None):
string.replace('PT','')
if 'H' in string:
h = string.index('H')
if ''
m = string.index('M')
secs = re.sub('[^0-9]','',string[-3:])
mins = re.sub('[^0-9]','',string[m-2:m 1])
hours = re.sub('[^0-9]','',string[h-2:h 1])
elif 'M' in string:
m = string.index('M')
secs = re.sub('[^0-9]','',string[-3:])
mins = re.sub('[^0-9]','',string[m-2:m 1])
hours = 0
else:
secs = re.sub('[^0-9]','',string[-3:])
mins, hours = 0, 0
return int(hours) * 60 * 60 int(mins) * 60 int(secs)
However, this code does not always work because there are strings that contain hours but not minutes or seconds and so on.
For example PT1H15S
, or PT1H
, or PT12H4M
.
How can I get this code to work in such cases?
CodePudding user response:
Sample and explanation of terms: https://regex101.com/r/l8cNAP/2
Regex:
r"PT(?:(?P<h>\d )H)?(?:(?P<m>\d )M)?(?:(?P<s>\d )S)?"
Query the resulting matchobject/dict with 0 as default if key is missing
CodePudding user response:
import re
apiDuration = 'PT1H15M9S'
regex = r"PT(?:(?P<hours>\d*)H)?(?:(?P<minutes>\d*)M)?(?:(?P<seconds>\d*)S)?"
parsedDuration = re.match(regex, apiDuration)
print(parsedDuration.groups()) # ('1', '15', '9')
print(parsedDuration.group('hours')) # 1
print(parsedDuration.group('minutes')) # 15
print(parsedDuration.group('seconds')) # 9