I am trying to match n years m months and x days pattern using regex. n years
, m months
, x days
and and
may or may not be in the string. For exact match i am able to extract this using the regex:
re.search(r'(?:\d year(s?))?\s*(?:\d month(s?))?\s*(?:\d day(s?))?', '2 years 25 days')
which returns 2 years 25 days, but if there is addtional text in the string I don't get the match like:
re.search(r'(?:\d year(s?))?\s*(?:\d month(s?))?\s*(?:\d day(s?))?', 'in 2 years 25 days')
retunrs ''
I tried this:
re.search(r'.*(?:\d year(s?))?\s*(?:\d month(s?))?\s*(?:\d day(s?))?.*', 'in 2 years 25 days')
whih returns the whole string, but I dont want the additional text.
CodePudding user response:
Since years
, months
, days
are temporal units, you could use the pint module for that.
Parse temporal units with pint
See the String parsing tutorial and related features used:
- printing quantities, see String formatting
- converting quantities, see Converting to different units
from pint import UnitRegistry
ureg = UnitRegistry()
temporal_strings = '2 years and 25 days'.split('and') # remove and split
quantities = [ureg(q) for q in temporal_strings] # parse quantities
# [<Quantity(2, 'year')>, <Quantity(25, 'day')>]
# print the quantities separately
for q in quantities:
print(q)
# get the total days
print(f"total: {sum(quantities)}")
print(f"total days: {sum(quantities).to('days')}")
Output printed:
2 year
25 day
total: 2.0684462696783026 year
total days: 755.5 day
CodePudding user response:
You get an empty string with the last pattern as all the parts in the regex are optional, so it will also match an empty string.
If all the parts are optional but you want to match at least 1 of them, you can use a leading assertion.
\b(?=\d (?:years?|months?|days?)\b)(?:\d years?)?(?:\s*\d months?)?(?:\s*\d days?)?\b
Explanation
\b
A word boundary(?=\d (?:years?|months?|days?)\b)
Assert to the right 1 digits and 1 of the alternatives(?:\d years?)?
Match 1 digits, space and year or years(?:\s*\d months?)?
Same for months(?:\s*\d days?)?
Same for years\b
A word boundary
Example
import re
pattern = r'\b(?=\d (?:years?|months?|days?)\b)(?:\d years?)?(?:\s*\d months?)?(?:\s*\d days?)?\b'
m = re.search(pattern, 'in 2 years 25 days')
if m:
print(m.group())
Output
2 years 25 days
CodePudding user response:
You can try this:
import re
match =re.search(r'(?:\d year(s?))?\s*(?:\d month(s?))?\s*(?:\d day(s?))', 'in 2 years 25 days')
if match:
print(match.group())
Output:
2 years 25 days