Home > OS >  Python Regex re.search - groupdict() - Date format matching
Python Regex re.search - groupdict() - Date format matching

Time:10-06

I need to get the date month from various strings such as '14th oct', '14oct', '14.10', '14 10' and '14/10'. For these cases my below code working fine.

query = '14.oct'
print(re.search(r'(?P<date>\b\d{1,2})(?:\b|st|nd|rd|th)?(?:[\s\.\-/_\\,]*)(?P<month>\d{1,2}|[a-z]{3,9})', query, re.I).groupdict())

Result:-

{'date': '14', 'month': 'oct'}

But for this case (1410), its still capturing the date and month. But I don't want that, since this will be another number format of that entire string and not to be considered as date and month. The result should be None.

How to change the search pattern for this? (with groupdict() only)

CodePudding user response:

How to change the search pattern for this?

You might try using negative lookbehind assertion literal ( combined with negative lookahead assertion literal ) as follows

import re
query = '14.oct'
noquery = '(1410)'
print(re.search(r'(?<!\()(?P<date>\b\d{1,2})(?:\b|st|nd|rd|th)?(?:[\s\.\-/_\\,]*)(?P<month>\d{1,2}|[a-z]{3,9})(?!\))', query, re.I).groupdict())
print(re.search(r'(?<!\()(?P<date>\b\d{1,2})(?:\b|st|nd|rd|th)?(?:[\s\.\-/_\\,]*)(?P<month>\d{1,2}|[a-z]{3,9})(?!\))', noquery, re.I))

output

{'date': '14', 'month': 'oct'}
None

Beware that it does prevent all bracketed forms, i.e. not only (1410) but also (14 10), (14/10) and so on.

CodePudding user response:

Not sure if you don't want to match 1410 as in 4 digits only or (1410) with the parenthesis, but to exclude matching both you can make sure there are not 4 consecutive digits:

(?P<date>\b(?!\d{4}\b)\d{1,2})(?:st|[nr]d|th)?[\s./_\\,-]*(?P<month>\d{1,2}|[a-z]{3,9})

Regex demo

To not match any date between parenthesis

\([^()]*\)|(?P<date>\b\d{1,2})(?:st|[nr]d|th)?[\s./_\\,-]*(?P<month>\d{1,2}|[a-z]{3,9})
  • \([^()]*\) Match from opening till closing parenthesis
  • | Or
  • (?P<date>\b\d{1,2}) Match 1-2 digits
  • (?:st|[nr]d|th)? Optionally match st nd rd th
  • [\s./_\\,-]* Optionally repeat matching any of the listed
  • (?P<month>\d{1,2}|[a-z]{3,9}) Match 1-2 digits or 3-9 chars a-z

Regex demo

For example

import re

pattern = r"\([^()]*\)|(?P<date>\b\d{1,2})(?:st|[nr]d|th)?(?:[\s./_\\,-]*)(?P<month>\d{1,2}|[a-z]{3,9})"
strings = ["14th oct", "14oct", "14.10", "14 10", "14/10", "1410", "(1410)"]

for s in strings:
    m = re.search(pattern, s, re.I)
    if m.group(1):
        print(m.groupdict())
    else:
        print(f"{s} --> Not valid")

Output

{'date': '14', 'month': 'oct'}
{'date': '14', 'month': 'oct'}
{'date': '14', 'month': '10'}
{'date': '14', 'month': '10'}
{'date': '14', 'month': '10'}
{'date': '14', 'month': '10'}
(1410) --> Not valid
  • Related