Python 3.
I'm trying to include all the possible regex patterns for identifying telephone numbers into one variable. I am separating them with pipes.
I receive the TypeError code when iterating through my input data structre: In this case, a Dictionary of names:phone numbers
import re
text = {'Forest': '123-456-7890', 'Johanna': '(987) 654-4321', 'Mom': '555.555.5555', 'Camille':'9988776655'}
regexPat = r'(\d{3})-(\d{3}-\d{4})|(\(\d{3}\)) (\d{3}-\d{4})|(\d{3})\.(\d{3}\.\d{4}|(\d{3})(\d{7}))'
print("Using 'pipes' to separate possible regex patterns")
phNum = re.compile(regexPat)
for k in text:
mo = phNum.search(text[k])
print(k '\'s area code: ' mo.group(1))
print('Suffix: ' mo.group(2), end=' Whole Number: ')
print(mo.groups())
RESULT / ERROR:
Using 'pipes' to separate possible regex patterns
Forest's area code: 123
Suffix: 456-7890 Whole Number: ('123', '456-7890', None, None, None, None, None, None)
Traceback (most recent call last):
File "z:\documents\programming\mypythonscripts\isphonenumber.py", line 16, in
print(k ''s area code: ' mo.group(1))
TypeError: can only concatenate str (not "NoneType") to str>
Based on the print statements up until failure, what I think is happening is the regex patterns are not finding any hits so they're being returned as NoneType data to the groups.
Is there a workaround for this type of thing? Should I be looking at optional matching?
CodePudding user response:
I think you quite got why it is not working. You have 8 capturing groups, for 'Forest' the pattern is matching with group 1 and 2, that's why your code works, in the 2nd iteration for 'Johanna' group 1 and 2 return None, therefore group 3 and 4 match the pattern. At this point the code fails.
As @Wiktor suggested, with a small change and kind of the same approach you could go with the solution of the link. I have a little different solution, you only search for 3 groups (1 for prefix and 2,3 for suffix) like this:
text = {'Forest': '123-456-7890', 'Johanna': '(987) 654-4321', 'Mom': '555.555.5555', 'Camille':'9988776655'}
pattern = r"^\(?(\d{3})(?:\-|\)\s|\.|)?(\d{3}(\-|\.|)?\d{4})$"
num = re.compile(pattern)
for key,value in text.items():
mo = num.search(value)
prefix = mo.group(1)
suffix = ''.join((x for x in mo.group(2) if x.isdigit()))
#suffix = ''.join((x for x in mo.group(2) if not x in mo.group(3))) #works aswell
print(key '\'s area code: ' prefix)
print('Suffix: ', suffix, end=' Whole Number: ')
print(prefix suffix)
# Output:
Forest's area code: 123
Suffix: 4567890 Whole Number: 1234567890
Johanna's area code: 987
Suffix: 6544321 Whole Number: 9876544321
Mom's area code: 555
Suffix: 5555555 Whole Number: 5555555555
Camille's area code: 998
Suffix: 8776655 Whole Number: 9988776655
CodePudding user response:
I suggest using your pattern without any groups to make it simpler, and once you have a match, remove the non-digit chars and get the parts you want with mere slicing:
import re
text = {'Forest': '123-456-7890', 'Johanna': '(987) 654-4321', 'Mom': '555.555.5555', 'Camille':'9988776655'}
regexPat = r'^(?:\d{3}-\d{3}-\d{4}|\(\d{3}\) \d{3}-\d{4}|\d{3}\.\d{3}\.\d{4}|\d{10})$'
print("Using 'pipes' to separate possible regex patterns")
phNum = re.compile(regexPat)
for k in text:
mo = phNum.search(text[k])
if mo:
phone_num_text = "".join(c for c in mo.group() if c.isdigit())
print(f"{k}'s area code: {phone_num_text[:3]}")
print(f'Suffix: {phone_num_text[3:]}')
print(f'Whole Number: {phone_num_text}')
See the Python demo. Output:
Using 'pipes' to separate possible regex patterns
Forest's area code: 123
Suffix: 4567890
Whole Number: 1234567890
Johanna's area code: 987
Suffix: 6544321
Whole Number: 9876544321
Mom's area code: 555
Suffix: 5555555
Whole Number: 5555555555
Camille's area code: 998
Suffix: 8776655
Whole Number: 9988776655