Home > OS >  Finding dates in text using regex
Finding dates in text using regex

Time:04-11

I want to find all dates in a text if there is no word Effective before the date. For example, I have the following line:

FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022

My regex should return ['January , 2022', 'January 5, 2022']

How can I do this in Python?

My attempt:

>>> import re
>>> rule = '((?<!Effective\ )([A-Za-z]{3,9}\ *\d{1,2}\ *,\ *\d{4}))'
>>> text = 'FEE SCHEDULE Effective January 1, 2022 STATE OF January 7, 2022 ALASKA DISCLAIMER The January 5, 2022'
>>> re.findall(rule, text)
[('anuary 1, 2022', 'anuary 1, 2022'), ('January 7, 2022', 'January 7, 2022'), ('January 5, 2022', 'January 5, 2022')]

But it doesn't work.

CodePudding user response:

You can use

\b(?<!Effective\s)[A-Za-z]{3,9}\s*\d{1,2}\s*,\s*\d{4}(?!\d)

See the regex demo. Details:

  • \b - a word boundary
  • (?<!Effective\s) - a negative lookbehind that fails the match if there is Effective a whitespace char immediately to the left of the current location
  • [A-Za-z]{3,9} - three to nine ASCII letters
  • \s* - zero or more whitespaces
  • \d{1,2} - one or two digits
  • \s*,\s* - a comma enclosed with zero or more whitespaces
  • \d{4} - four digits
  • (?!\d) - a negative lookahead that fails the match if there is a digit immediately on the right.
  • Related