Home > Blockchain >  Regex failing with text conversion to days - Python 3.10.x
Regex failing with text conversion to days - Python 3.10.x

Time:09-24

I have a list of time durations in text, for example, ['142 Days 16 Hours', '128 Days 9 Hours 43 Minutes', '10 Minutes']

I need to build a function to take these durations and instead come up with the total number of days.

The specific text could be a single day, days and hours, hours and minutes, a single set of minutes, or a day, hour, and minute.

I have tried the following:

def parse_dates(data):
    days = int(re.match(r'\d \sDay', data)[0].split(' ')[0]) if re.match(r'\d \sDay', data) is not None else 0
    hours = int(re.match(r'\d \sHour', data)[0].split(' ')[0]) if re.match(r'^\d Hour*s$', data) is not None else 0
    minutes = int(re.match(r'\d \sMinute', data)[0].split(' ')[0]) if re.match(r'\d \sMinute', data) is not None else 0

    days  = hours / 24
    days  = minutes / 1440

    return days

However, the hours and minutes are ALWAYS showing as 0. How can I fix my regex, or devise a better solution, to parse these files appropriately?

CodePudding user response:

You could try the following regex (Demo):

(?:(\d ) Days)?(?: ?(\d ) Hours)?(?: ?(\d ) Minutes)?

Explanation:

  • (?:...) marks a non-capturing group
  • (...) marks a captured group
  • ? after a symbol or group means it is optional
  • \d means one or more digits (0123...)

Sample Python implementation:

import re

_DHM_RE = re.compile(r'(?:(\d ) Days)?(?: ?(\d ) Hours)?(?: ?(\d ) Minutes)?')
_HOURS_IN_DAY = 24
_MINUTES_IN_DAY = 60 * _HOURS_IN_DAY


def parse_dates(s: str) -> int:
    m = _DHM_RE.search(s)
    if m is None:
        return 0

    days = int(m.group(1) or 0)
    hours = int(m.group(2) or 0)
    minutes = int(m.group(3) or 0)

    days  = hours / _HOURS_IN_DAY
    days  = minutes / _MINUTES_IN_DAY

    return int(days)


strings = """\
142 Days 16 Hours
128 Days 9 Hours 43 Minutes
10 Minutes
52 Hours
""".splitlines()

for s in strings:
    d = parse_dates(s)
    print(f'{s!r} has {d} days.')

CodePudding user response:

Here's a way to do it:

import re
a = ['142 Days 16 Hours', '128 Days 9 Hours 43 Minutes', '10 Minutes']
def parse_dates(data):
    x = [re.search('(\d )\s'   unit, data) for unit in ['Day', 'Hour', 'Minute']]
    x = [0 if y is None else int(y.group(1)) for y in x]
    return x[0]   x[1] / 24   x[2] / 1440
[print(parse_dates(data)) for data in a]

Output:

142.66666666666666
128.4048611111111
0.006944444444444444
  • Related