Home > Mobile >  Given string, return a dictionary of all the phone numbers in that text
Given string, return a dictionary of all the phone numbers in that text

Time:10-13

I just started learning dictionaries and regex and I'm having trouble creating a dictionary. In my task, area code is a combination of plus sign and three numbers. The phone number itself is a combination of 7-8 numbers. The phone number might be separated from the area code with a whitespace, but not necessarily.

def find_phone_numbers(text: str) -> dict:
    pattern = r'\ \w{3} \w{8}|\ \w{11}|\ \w{3} \w{7}|\ \w{10}|\w{8}|\w{7}'
    match = re.findall(pattern, text)
    str1 = " "
    phone_str = str1.join(match)
    phone_dict = {}
    phones = phone_str.split(" ")
    for phone in phones:
        if phone[0] == " ":
            phone0 = phone
    if phone_str[0:4] not in phone_dict.keys():
        phone_dict[phone_str[0:4]] = [phone_str[5:]]
    return phone_dict

The result should be:

print(find_phone_numbers(" 372 56887364 37256887364 33359835647 56887364 11 1234567 327 1 11111111")) ->

{' 372': ['56887364', '56887364'], ' 333': ['59835647'], '': ['56887364', '1234567', '11111111']}

The main problem is that phone numbers with the same area code can be written together or separately. I had an idea to use a for loop to get rid of the "tail" in the form of a phone number and only the area code will remain, but I don't understand how to get rid of the tail here 33359835647. How can this be done and is there a more efficient way?

CodePudding user response:

Try (the regex pattern explained here - Regex101):

import re

s = " 372 56887364   37256887364   33359835647  56887364  11 1234567  327 1 11111111"
pat = re.compile(r"(\ \d{3})?\s*(\d{7,8})")

out = {}
for pref, number in pat.findall(s):
    out.setdefault(pref, []).append(number)

print(out)

Prints:

{
    " 372": ["56887364", "56887364"],
    " 333": ["59835647"],
    "": ["56887364", "1234567", "11111111"],
}
  • Related