Home > database >  With regex in Python, can one use the or operator within a group? What would cause the result to be
With regex in Python, can one use the or operator within a group? What would cause the result to be

Time:02-16

My goal is to use the re module in Python to parse phone numbers that are in the appropriate format, and return a tuple of 3 sets of digits.

The appropriate formats are:

  1. with parentheses around the area code, and allowing multiple spaces until the 7-digit body of the phone number
  2. without parentheses around the area code, an optional hyphen between the area code and the 7-digit body, and NO spaces between the area code and the 7-digit body.
pattern = r"^ *(\((\d\d\d)\) *|\d\d\d\-?)(\d\d\d)\-?(\d\d\d\d) *$"
bad = "404 555-3355"
good_string = "     (444)    555-5555   "
good_string_2 = "    505-505-5555   "
values = re.match(pattern, s).groups()

I tried to use the | to specify that spaces are allowed only when there are () around the area code, but otherwise spaces are not allowed. But this throws an "AttributeError: 'NoneType' object has no attribute 'groups'"

I have tried reading the documentation on the re module and have searched around, but have not found a solution yet.

CodePudding user response:

I would express this using an alternation for the two versions of acceptable leading area codes:

(?:\(\d{3}\)\s*|\d{3}-)\d{3}-\d{4}

Python script:

inp = "404 555-3355 (444)    555-5555   505-505-5555   "
nums = re.findall(r'(?:\(\d{3}\)\s*|\d{3}-)\d{3}-\d{4}', inp)
output = [re.findall(r'\d ', x) for x in nums]
print(output)  # [['444', '555', '5555'], ['505', '505', '5555']]

As you can see, only the valid latter two phone numbers match the regex pattern.

CodePudding user response:

The error is because the bad string returns None instead of a re.Match Object. Then calling groups() on None gives you this error.

You will have to filter out the None values, and then only perform groups() on the valid matches.

Or find a different solution.

This should bring you closer:

pattern = r"^ *(\(\d{3}\) *|\d{3}-?)(\d{3})\-?(\d{4}) *$"
xs = [
"404 555-3355", #bad
"     (444)    555-5555   ", #good
"    505-505-5555   ", #good
]

matches = [re.match(pattern, x) for x in xs ]
filtered_and_grouped_matches = [x.groups() for x in matches if x != None]
print(filtered_and_grouped_matches)
  • Related