I want to find all dates in a text if there is no word Effective before the date. For example, I have the following line:
FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022
My regex should return ['January , 2022', 'January 5, 2022']
How can I do this in Python?
My attempt:
>>> import re
>>> rule = '((?<!Effective\ )([A-Za-z]{3,9}\ *\d{1,2}\ *,\ *\d{4}))'
>>> text = 'FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022'
>>> re.findall(rule, text)
[('anuary 1, 2022', 'anuary 1, 2022'), ('January 7, 2022', 'January 7, 2022'), ('January 5, 2022', 'January 5, 2022')]
But it doesn't work.
CodePudding user response:
You can use
\b(?<!Effective\s)[A-Za-z]{3,9}\s*\d{1,2}\s*,\s*\d{4}(?!\d)
See the regex demo. Details:
\b
- a word boundary(?<!Effective\s)
- a negative lookbehind that fails the match if there isEffective
a whitespace char immediately to the left of the current location[A-Za-z]{3,9}
- three to nine ASCII letters\s*
- zero or more whitespaces\d{1,2}
- one or two digits\s*,\s*
- a comma enclosed with zero or more whitespaces\d{4}
- four digits(?!\d)
- a negative lookahead that fails the match if there is a digit immediately on the right.