Home > Software engineering >  Extract n years m months and x days pattern using python regex
Extract n years m months and x days pattern using python regex

Time:07-20

I am trying to match n years m months and x days pattern using regex. n years, m months, x days and and may or may not be in the string. For exact match i am able to extract this using the regex:

re.search(r'(?:\d  year(s?))?\s*(?:\d  month(s?))?\s*(?:\d  day(s?))?', '2 years 25 days')

which returns 2 years 25 days, but if there is addtional text in the string I don't get the match like:

re.search(r'(?:\d  year(s?))?\s*(?:\d  month(s?))?\s*(?:\d  day(s?))?', 'in 2 years 25 days')

retunrs ''

I tried this:

re.search(r'.*(?:\d  year(s?))?\s*(?:\d  month(s?))?\s*(?:\d  day(s?))?.*', 'in 2 years 25 days')

whih returns the whole string, but I dont want the additional text.

CodePudding user response:

Since years, months, days are temporal units, you could use the pint module for that.

Parse temporal units with pint

See the String parsing tutorial and related features used:

from pint import UnitRegistry

ureg = UnitRegistry()

temporal_strings = '2 years and 25 days'.split('and')  # remove and split
quantities = [ureg(q) for q in temporal_strings]  # parse quantities
# [<Quantity(2, 'year')>, <Quantity(25, 'day')>]

# print the quantities separately
for q in quantities:
    print(q)

# get the total days
print(f"total: {sum(quantities)}")
print(f"total days: {sum(quantities).to('days')}")

Output printed:

2 year
25 day
total: 2.0684462696783026 year
total days: 755.5 day

CodePudding user response:

You get an empty string with the last pattern as all the parts in the regex are optional, so it will also match an empty string.

If all the parts are optional but you want to match at least 1 of them, you can use a leading assertion.

\b(?=\d  (?:years?|months?|days?)\b)(?:\d  years?)?(?:\s*\d  months?)?(?:\s*\d  days?)?\b

Explanation

  • \b A word boundary
  • (?=\d (?:years?|months?|days?)\b) Assert to the right 1 digits and 1 of the alternatives
  • (?:\d years?)? Match 1 digits, space and year or years
  • (?:\s*\d months?)? Same for months
  • (?:\s*\d days?)? Same for years
  • \b A word boundary

Regex demo | Python demo

Example

import re

pattern = r'\b(?=\d  (?:years?|months?|days?)\b)(?:\d  years?)?(?:\s*\d  months?)?(?:\s*\d  days?)?\b'
m = re.search(pattern, 'in 2 years 25 days')
if m:
    print(m.group())

Output

2 years 25 days

CodePudding user response:

You can try this:

import re
match =re.search(r'(?:\d  year(s?))?\s*(?:\d  month(s?))?\s*(?:\d  day(s?))', 'in 2 years 25 days')
if match:
 print(match.group())

Output:

2 years 25 days
  • Related