My goal is to use the re module in Python to parse phone numbers that are in the appropriate format, and return a tuple of 3 sets of digits.
The appropriate formats are:
- with parentheses around the area code, and allowing multiple spaces until the 7-digit body of the phone number
- without parentheses around the area code, an optional hyphen between the area code and the 7-digit body, and NO spaces between the area code and the 7-digit body.
pattern = r"^ *(\((\d\d\d)\) *|\d\d\d\-?)(\d\d\d)\-?(\d\d\d\d) *$"
bad = "404 555-3355"
good_string = " (444) 555-5555 "
good_string_2 = " 505-505-5555 "
values = re.match(pattern, s).groups()
I tried to use the | to specify that spaces are allowed only when there are () around the area code, but otherwise spaces are not allowed. But this throws an "AttributeError: 'NoneType' object has no attribute 'groups'"
I have tried reading the documentation on the re module and have searched around, but have not found a solution yet.
CodePudding user response:
I would express this using an alternation for the two versions of acceptable leading area codes:
(?:\(\d{3}\)\s*|\d{3}-)\d{3}-\d{4}
Python script:
inp = "404 555-3355 (444) 555-5555 505-505-5555 "
nums = re.findall(r'(?:\(\d{3}\)\s*|\d{3}-)\d{3}-\d{4}', inp)
output = [re.findall(r'\d ', x) for x in nums]
print(output) # [['444', '555', '5555'], ['505', '505', '5555']]
As you can see, only the valid latter two phone numbers match the regex pattern.
CodePudding user response:
The error is because the bad string returns None
instead of a re.Match Object
. Then calling groups()
on None
gives you this error.
You will have to filter out the None
values, and then only perform groups()
on the valid matches.
Or find a different solution.
This should bring you closer:
pattern = r"^ *(\(\d{3}\) *|\d{3}-?)(\d{3})\-?(\d{4}) *$"
xs = [
"404 555-3355", #bad
" (444) 555-5555 ", #good
" 505-505-5555 ", #good
]
matches = [re.match(pattern, x) for x in xs ]
filtered_and_grouped_matches = [x.groups() for x in matches if x != None]
print(filtered_and_grouped_matches)