Regex to unify a format of phone numbers in Python-CodePudding

I'm trying a regex to match a phone like 34(prefix), single space, followed by 9 digits that may or may not be separated by spaces.

 34 886 24 68 98
 34 980 202 157

I would need a regex to work with these two example cases.

I tried this ^(\ 34)\s([ *]|[0-9]{9}) but is not it.

Ultimately I'll like to match a phone like 34 "prefix", single space, followed by 9 digits, no matter what of this cases given. For that I'm using re.sub() function but I'm not sure how.

 34 886 24 68 98 -> ?
 34 980 202 157  -> ?

 34 846082423 -> `^(\ 34)\s(\d{9})$`
 34920459596  -> `^(\ 34)(\d{9})$`

import re

from faker import Faker
from faker.providers import BaseProvider

#fake = Faker("es_ES")

class CustomProvider(BaseProvider):

    def phone(self):
        #phone = fake.phone_number()
        phone = " 34812345678"
        return re.sub(r'^(\ 34)(\d{9})$', r'\1 \2', phone)

CodePudding user response：

You can try:

^\ 34\s*(?:\d\s*){9}$

^ - beginning of the string

\ 34\s* - match 34 followed by any number of spaces

(?:\d\s*){9} - match number followed by any number of spaces 9 times

$ - end of string

Regex demo.

CodePudding user response：

I would capture the numbers like this: r"(\ 34(?:\s?\d){9})". That will allows you to search for numbers allowing whitespace to optionally be placed before any of them. Using a non-capturing group ?: to allow repeating \s?\d without having each number listed as a group on its own.

import re

nums = """
Number 1:  34 886 24 68 98
Number 2:  34 980 202 157
Number 3:  34812345678
"""

number_re = re.compile(r"(\ 34(?:\s?\d){9})")

for match in number_re.findall(nums):
    print(match)

 34 886 24 68 98
 34 980 202 157
 34812345678

CodePudding user response：

Here's a simple approach: use regex to get the plus sign and all the numbers into an array (one char per element), then use other list and string manipulation operations to format it the way you like.

import re

p1 = " 34 886 24 68 98"
p2 = " 34 980 202 157"

pattern = r'[ \d]'

m1 = re.findall(pattern, p1)
m2 = re.findall(pattern, p2)

m1_str = f"{''.join(m1[:3])} {''.join(m1[3:])}"
m2_str = f"{''.join(m2[:3])} {''.join(m2[3:])}"

print(m1_str)  #  34 886246898
print(m2_str)  #  34 980202157

Or removing spaces using string replacement instead of regex:

p1 = " 34 886 24 68 98"
p2 = " 34 980 202 157"

p1_compact = p1.replace(' ', '')
p2_compact = p2.replace(' ', '')

p1_str = f"{p1_compact[:3]} {p1_compact[3:]}"
p2_str = f"{p2_compact[:3]} {p2_compact[3:]}"

print(p1_str)  #  34 886246898
print(p2_str)  #  34 980202157