Multiple regex patterns for input data: TypeError: can only concatenate str (not "NoneType&quot-CodePudding

Python 3.

I'm trying to include all the possible regex patterns for identifying telephone numbers into one variable. I am separating them with pipes.

I receive the TypeError code when iterating through my input data structre: In this case, a Dictionary of names:phone numbers

import re

text = {'Forest': '123-456-7890', 'Johanna': '(987) 654-4321', 'Mom': '555.555.5555', 'Camille':'9988776655'}

regexPat = r'(\d{3})-(\d{3}-\d{4})|(\(\d{3}\)) (\d{3}-\d{4})|(\d{3})\.(\d{3}\.\d{4}|(\d{3})(\d{7}))'

print("Using 'pipes' to separate possible regex patterns")

phNum = re.compile(regexPat)

for k in text:
        mo = phNum.search(text[k])
        print(k '\'s area code: '  mo.group(1))
        print('Suffix: '   mo.group(2), end=' Whole Number: ')
        print(mo.groups())

RESULT / ERROR:

Using 'pipes' to separate possible regex patterns
Forest's area code: 123
Suffix: 456-7890 Whole Number: ('123', '456-7890', None, None, None, None, None, None)
Traceback (most recent call last):
File "z:\documents\programming\mypythonscripts\isphonenumber.py", line 16, in
print(k ''s area code: ' mo.group(1))
TypeError: can only concatenate str (not "NoneType") to str>

Based on the print statements up until failure, what I think is happening is the regex patterns are not finding any hits so they're being returned as NoneType data to the groups.

Is there a workaround for this type of thing? Should I be looking at optional matching?

CodePudding user response：

I think you quite got why it is not working. You have 8 capturing groups, for 'Forest' the pattern is matching with group 1 and 2, that's why your code works, in the 2nd iteration for 'Johanna' group 1 and 2 return None, therefore group 3 and 4 match the pattern. At this point the code fails.

As @Wiktor suggested, with a small change and kind of the same approach you could go with the solution of the link. I have a little different solution, you only search for 3 groups (1 for prefix and 2,3 for suffix) like this:

text = {'Forest': '123-456-7890', 'Johanna': '(987) 654-4321', 'Mom': '555.555.5555', 'Camille':'9988776655'}
pattern = r"^\(?(\d{3})(?:\-|\)\s|\.|)?(\d{3}(\-|\.|)?\d{4})$"
num = re.compile(pattern)
for key,value in text.items():
    mo = num.search(value)
    prefix = mo.group(1)
    suffix = ''.join((x for x in mo.group(2) if x.isdigit()))
    #suffix = ''.join((x for x in mo.group(2) if not x in mo.group(3))) #works aswell
    print(key '\'s area code: '  prefix)
    print('Suffix: ', suffix, end=' Whole Number: ')
    print(prefix suffix)

# Output:
Forest's area code: 123
Suffix:  4567890 Whole Number: 1234567890
Johanna's area code: 987
Suffix:  6544321 Whole Number: 9876544321
Mom's area code: 555
Suffix:  5555555 Whole Number: 5555555555
Camille's area code: 998
Suffix:  8776655 Whole Number: 9988776655

CodePudding user response：

I suggest using your pattern without any groups to make it simpler, and once you have a match, remove the non-digit chars and get the parts you want with mere slicing:

import re
 
text = {'Forest': '123-456-7890', 'Johanna': '(987) 654-4321', 'Mom': '555.555.5555', 'Camille':'9988776655'}
 
regexPat = r'^(?:\d{3}-\d{3}-\d{4}|\(\d{3}\) \d{3}-\d{4}|\d{3}\.\d{3}\.\d{4}|\d{10})$'
 
print("Using 'pipes' to separate possible regex patterns")
 
phNum = re.compile(regexPat)
 
for k in text:
        mo = phNum.search(text[k])
        if mo:
            phone_num_text = "".join(c for c in mo.group() if c.isdigit())
            print(f"{k}'s area code: {phone_num_text[:3]}")
            print(f'Suffix: {phone_num_text[3:]}')
            print(f'Whole Number: {phone_num_text}')

See the Python demo. Output:

Using 'pipes' to separate possible regex patterns
Forest's area code: 123
Suffix: 4567890
Whole Number: 1234567890
Johanna's area code: 987
Suffix: 6544321
Whole Number: 9876544321
Mom's area code: 555
Suffix: 5555555
Whole Number: 5555555555
Camille's area code: 998
Suffix: 8776655
Whole Number: 9988776655