I am a complete newbie in Regex.
I need to parse US phone numbers in a different format into 3 strings: area code (no '()'), next 3 digits, last 4 digits. No '-'.
I also need to reject (message Error):
916-111-1111 ('-' after the area code) (916)111 -1111 (white space before '-') ( 916)111-1111 (any space inside of area code) - (916 ) - must be rejected too
(a56)111-1111 (any non-digits inside of area code)
lack of '()' for the area code
it should OK: ' (916) 111-1111 ' (spaces anywhere except as above)
here is my regex:
^\s*\(?(\d{3})[\)\-][\s]*?(\d{3})[-]?(\d{4})\s*$
This took me 2 days.
It did not fail 916-111-1111 (availability of '-' after area code). I am sure there are some other deficiencies.
I would appreciate your help very much. Even hints.
Valid:
'(916) 111-1111'
'(916)111-1111 '
' (916) 111-1111'
INvalid:
'916-111-1111' - no () or '-' after area code
'(916)111 -1111' - no space before '-'
'( 916)111-1111' - no space inside ()
'(abc) 111-11i1' because of non-digits
CodePudding user response:
You can do this:
import re
r = r'\((\d{3})\)\s*?(\d{3})\-(\d{4,5})'
l = ['(916) 111-11111', '(916)111-1111 ', ' (916) 111-1111', '916-111-1111', '(916)111 -1111', '( 916)111-1111', '(abc) 111-11i1']
print([re.findall(r, x) for x in l])
# [[('916', '111', '11111')], [('916', '111', '1111')], [('916', '111', '1111')], [], [], [], []]
CodePudding user response:
You can simplify the regex as follows:
"(\d{1,3})\D*(\d{3})\D*(\d{4})"
Code:
import re
if __name__ == '__main__':
phone_numbers = ["916-111-1111", "(916)111 -1111", "( 916)111-1111", "- (916 )111-1111", "(a56)111-1111"]
for phone_number_str in phone_numbers:
results = re.findall("(\d{1,3})\D*(\d{3})\D*(\d{4})", phone_number_str)
if 0 == len(results):
print(f"[main] FAIL: {phone_number_str}")
print(results[0])
Result:
('916', '111', '1111')
('916', '111', '1111')
('916', '111', '1111')
('916', '111', '1111')
('56', '111', '1111')
Note: D
represents non-digit characters.