I have a list of time durations in text, for example, ['142 Days 16 Hours', '128 Days 9 Hours 43 Minutes', '10 Minutes']
I need to build a function to take these durations and instead come up with the total number of days.
The specific text could be a single day, days and hours, hours and minutes, a single set of minutes, or a day, hour, and minute.
I have tried the following:
def parse_dates(data):
days = int(re.match(r'\d \sDay', data)[0].split(' ')[0]) if re.match(r'\d \sDay', data) is not None else 0
hours = int(re.match(r'\d \sHour', data)[0].split(' ')[0]) if re.match(r'^\d Hour*s$', data) is not None else 0
minutes = int(re.match(r'\d \sMinute', data)[0].split(' ')[0]) if re.match(r'\d \sMinute', data) is not None else 0
days = hours / 24
days = minutes / 1440
return days
However, the hours and minutes are ALWAYS showing as 0. How can I fix my regex
, or devise a better solution, to parse these files appropriately?
CodePudding user response:
You could try the following regex (Demo):
(?:(\d ) Days)?(?: ?(\d ) Hours)?(?: ?(\d ) Minutes)?
Explanation:
(?:...)
marks a non-capturing group(...)
marks a captured group?
after a symbol or group means it is optional\d
means one or more digits (0123...)
Sample Python implementation:
import re
_DHM_RE = re.compile(r'(?:(\d ) Days)?(?: ?(\d ) Hours)?(?: ?(\d ) Minutes)?')
_HOURS_IN_DAY = 24
_MINUTES_IN_DAY = 60 * _HOURS_IN_DAY
def parse_dates(s: str) -> int:
m = _DHM_RE.search(s)
if m is None:
return 0
days = int(m.group(1) or 0)
hours = int(m.group(2) or 0)
minutes = int(m.group(3) or 0)
days = hours / _HOURS_IN_DAY
days = minutes / _MINUTES_IN_DAY
return int(days)
strings = """\
142 Days 16 Hours
128 Days 9 Hours 43 Minutes
10 Minutes
52 Hours
""".splitlines()
for s in strings:
d = parse_dates(s)
print(f'{s!r} has {d} days.')
CodePudding user response:
Here's a way to do it:
import re
a = ['142 Days 16 Hours', '128 Days 9 Hours 43 Minutes', '10 Minutes']
def parse_dates(data):
x = [re.search('(\d )\s' unit, data) for unit in ['Day', 'Hour', 'Minute']]
x = [0 if y is None else int(y.group(1)) for y in x]
return x[0] x[1] / 24 x[2] / 1440
[print(parse_dates(data)) for data in a]
Output:
142.66666666666666
128.4048611111111
0.006944444444444444