Home > Software engineering >  regex to match and not capture some part of the string
regex to match and not capture some part of the string

Time:05-18

I am trying to capture dates that can be in a string like this

'30 jan and 6 apr and 12 oct 2022'

I am using python regex module (its the same as re but has 'overlapped' option).I need to have the end result as this list

['30 jan 2022', '6 apr 2022', '12 oct 2022']

so far with this expression

regex.findall(r'(?:\d\d | \d )(?:jan|feb|mar|ap|may|jun|jul|aug|sep|oct|nov|dec)(?:.*)20(?:\d\d)', d, overlapped=True)

I am getting

['30 jan and 6 apr and 12 oct 2022', ' 6 apr and 12 oct 2022', '12 oct 2022']

Thanks in advance.

CodePudding user response:

You might use a list comprehension and 2 capture groups:

\b(\d  (?:jan|feb|mar|ap|may|jun|jul|aug|sep|oct|nov|dec))(?=.*\b(20\d\d))\b

See a regex demo and a Python demo.

import re

pattern = r"\b(\d  (?:jan|feb|mar|ap|may|jun|jul|aug|sep|oct|nov|dec))(?=.*\b(20\d\d))\b"
s = r"30 jan and 6 apr and 12 oct 2022"

res = [' '.join(s) for s in re.findall(pattern, s)]
print(res)

Output

['30 jan 2022', '6 ap 2022', '12 oct 2022']

Note that (?:.*) and (?:\d\d) do not need the non capture group, as the group by itself has no purpose in the pattern.

  • Related