Home > Software design >  Python Regex: US phone number parsing
Python Regex: US phone number parsing

Time:09-16

I am a complete newbie in Regex.

I need to parse US phone numbers in a different format into 3 strings: area code (no '()'), next 3 digits, last 4 digits. No '-'.

I also need to reject (message Error):

916-111-1111 ('-' after the area code) (916)111 -1111 (white space before '-') ( 916)111-1111 (any space inside of area code) - (916 ) - must be rejected too

(a56)111-1111 (any non-digits inside of area code)

lack of '()' for the area code

it should OK: ' (916) 111-1111 ' (spaces anywhere except as above)

here is my regex:

^\s*\(?(\d{3})[\)\-][\s]*?(\d{3})[-]?(\d{4})\s*$

This took me 2 days.

It did not fail 916-111-1111 (availability of '-' after area code). I am sure there are some other deficiencies.

I would appreciate your help very much. Even hints.

Valid:

'(916) 111-1111'
'(916)111-1111     '
'   (916)      111-1111'

INvalid:

'916-111-1111' - no () or '-' after area code
'(916)111 -1111' - no space before '-'
'( 916)111-1111' - no space inside ()
'(abc) 111-11i1' because of non-digits

CodePudding user response:

You can do this:

import re
r = r'\((\d{3})\)\s*?(\d{3})\-(\d{4,5})'
l = ['(916) 111-11111', '(916)111-1111     ', '   (916)      111-1111', '916-111-1111', '(916)111 -1111', '( 916)111-1111', '(abc) 111-11i1']
print([re.findall(r, x) for x in l])

# [[('916', '111', '11111')], [('916', '111', '1111')], [('916', '111', '1111')], [], [], [], []]

CodePudding user response:

You can simplify the regex as follows:

"(\d{1,3})\D*(\d{3})\D*(\d{4})"

Code:

import re

if __name__ == '__main__':
    phone_numbers = ["916-111-1111", "(916)111 -1111", "( 916)111-1111", "- (916 )111-1111",  "(a56)111-1111"]
    for phone_number_str in phone_numbers:
        results = re.findall("(\d{1,3})\D*(\d{3})\D*(\d{4})", phone_number_str)
        if 0 == len(results):
            print(f"[main] FAIL: {phone_number_str}")
        print(results[0])

Result:

('916', '111', '1111')
('916', '111', '1111')
('916', '111', '1111')
('916', '111', '1111')
('56', '111', '1111')

Note: D represents non-digit characters.

  • Related