Home > Software design >  Regex for differentiating between phone number and year
Regex for differentiating between phone number and year

Time:02-09

My regex matches a year range in the string when I only want to match phone numbers.

Here is an example test string:

Call on tel: (425) 882-8080 or 852-9876 and it does not 
necessarily reflect the views of Microsoft Corp from 1986-1989.

Here the matched strings are:

(425) 882-8080
852-9876
986-1989

It takes "986-1989" from the date "1986-1989".

My regex:

((?:\d{1}[-\/\.\s]|\ \d{2}[-\/\.\s]??|\d{2}[-\/\.\s]??|\d{3}[-\/\.\s]??|\d{4}[-\/\.\s]??)?(?:\d{3}[-\/\.\s]??\d{3}[-\/\.\s]??\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4}))

Any suggestions on how to change this regex so that it doesn't consider the year?

CodePudding user response:

You can use a word boundary (\b) so that you can ensure the three-digit section of the phone number doesn't begin with extra characters:

(?:\(\d{3}\) )?\b\d{3}-\d{4}

https://regex101.com/r/AvkLzT/1

CodePudding user response:

In Python, you can use lookarounds, so you can use a pattern that only matches your phone numbers if there are no digits on both ends of the potential match.

re.findall(r'(?<!\d)(?:\d[-/.\s]|\ \d{2}[-/.\s]?|\d{2,4}[-/.\s]?)?(?:\d{3}[-/.\s]?\d{3}[-/.\s]?\d{4}|\(\d{3}\)\s*\d{3}[-.\s]?\d{4}|\d{3}[-.\s]?\d{4})(?!\d)', text)

See the regex demo.

Note you do not need to escape forward backslash in a Python regex string since / is not any special regex metacharacter.

Note also the (?<!\d) lookbehind and (?!\d) lookahead that fail the match if there is a digit before or after the pattern respectively.

I suggest replacing ?? with ? in your regex because the lazy pattern does not have any advantage here.

The \d{2}[-\/\.\s]??|\d{3}[-\/\.\s]??|\d{4}[-\/\.\s]?? part only differs in the amount of digits matched, so I shortened it to \d{2,4}[-/.\s]?.

The . char inside a character class needs no escaping as it only denotes a literal dot char there ([.] = \. in regex)

  •  Tags:  
  • Related