I am trying to capture dates that can be in a string like this
'30 jan and 6 apr and 12 oct 2022'
I am using python regex module (its the same as re but has 'overlapped' option).I need to have the end result as this list
['30 jan 2022', '6 apr 2022', '12 oct 2022']
so far with this expression
regex.findall(r'(?:\d\d | \d )(?:jan|feb|mar|ap|may|jun|jul|aug|sep|oct|nov|dec)(?:.*)20(?:\d\d)', d, overlapped=True)
I am getting
['30 jan and 6 apr and 12 oct 2022', ' 6 apr and 12 oct 2022', '12 oct 2022']
Thanks in advance.
CodePudding user response:
You might use a list comprehension and 2 capture groups:
\b(\d (?:jan|feb|mar|ap|may|jun|jul|aug|sep|oct|nov|dec))(?=.*\b(20\d\d))\b
See a regex demo and a Python demo.
import re
pattern = r"\b(\d (?:jan|feb|mar|ap|may|jun|jul|aug|sep|oct|nov|dec))(?=.*\b(20\d\d))\b"
s = r"30 jan and 6 apr and 12 oct 2022"
res = [' '.join(s) for s in re.findall(pattern, s)]
print(res)
Output
['30 jan 2022', '6 ap 2022', '12 oct 2022']
Note that (?:.*)
and (?:\d\d)
do not need the non capture group, as the group by itself has no purpose in the pattern.