My regex matches a year range in the string when I only want to match phone numbers.
Here is an example test string:
Call on tel: (425) 882-8080 or 852-9876 and it does not
necessarily reflect the views of Microsoft Corp from 1986-1989.
Here the matched strings are:
(425) 882-8080
852-9876
986-1989
It takes "986-1989" from the date "1986-1989".
My regex:
((?:\d{1}[-\/\.\s]|\ \d{2}[-\/\.\s]??|\d{2}[-\/\.\s]??|\d{3}[-\/\.\s]??|\d{4}[-\/\.\s]??)?(?:\d{3}[-\/\.\s]??\d{3}[-\/\.\s]??\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4}))
Any suggestions on how to change this regex so that it doesn't consider the year?
CodePudding user response:
You can use a word boundary (\b
) so that you can ensure the three-digit section of the phone number doesn't begin with extra characters:
(?:\(\d{3}\) )?\b\d{3}-\d{4}
https://regex101.com/r/AvkLzT/1
CodePudding user response:
In Python, you can use lookarounds, so you can use a pattern that only matches your phone numbers if there are no digits on both ends of the potential match.
re.findall(r'(?<!\d)(?:\d[-/.\s]|\ \d{2}[-/.\s]?|\d{2,4}[-/.\s]?)?(?:\d{3}[-/.\s]?\d{3}[-/.\s]?\d{4}|\(\d{3}\)\s*\d{3}[-.\s]?\d{4}|\d{3}[-.\s]?\d{4})(?!\d)', text)
See the regex demo.
Note you do not need to escape forward backslash in a Python regex string since /
is not any special regex metacharacter.
Note also the (?<!\d)
lookbehind and (?!\d)
lookahead that fail the match if there is a digit before or after the pattern respectively.
I suggest replacing ??
with ?
in your regex because the lazy pattern does not have any advantage here.
The \d{2}[-\/\.\s]??|\d{3}[-\/\.\s]??|\d{4}[-\/\.\s]??
part only differs in the amount of digits matched, so I shortened it to \d{2,4}[-/.\s]?
.
The .
char inside a character class needs no escaping as it only denotes a literal dot char there ([.]
= \.
in regex)