I just started learning dictionaries and regex and I'm having trouble creating a dictionary. In my task, area code is a combination of plus sign and three numbers. The phone number itself is a combination of 7-8 numbers. The phone number might be separated from the area code with a whitespace, but not necessarily.
def find_phone_numbers(text: str) -> dict:
pattern = r'\ \w{3} \w{8}|\ \w{11}|\ \w{3} \w{7}|\ \w{10}|\w{8}|\w{7}'
match = re.findall(pattern, text)
str1 = " "
phone_str = str1.join(match)
phone_dict = {}
phones = phone_str.split(" ")
for phone in phones:
if phone[0] == " ":
phone0 = phone
if phone_str[0:4] not in phone_dict.keys():
phone_dict[phone_str[0:4]] = [phone_str[5:]]
return phone_dict
The result should be:
print(find_phone_numbers(" 372 56887364 37256887364 33359835647 56887364 11 1234567 327 1 11111111")) ->
{' 372': ['56887364', '56887364'], ' 333': ['59835647'], '': ['56887364', '1234567', '11111111']}
The main problem is that phone numbers with the same area code can be written together or separately. I had an idea to use a for loop to get rid of the "tail" in the form of a phone number and only the area code will remain, but I don't understand how to get rid of the tail here 33359835647. How can this be done and is there a more efficient way?
CodePudding user response:
Try (the regex pattern explained here - Regex101):
import re
s = " 372 56887364 37256887364 33359835647 56887364 11 1234567 327 1 11111111"
pat = re.compile(r"(\ \d{3})?\s*(\d{7,8})")
out = {}
for pref, number in pat.findall(s):
out.setdefault(pref, []).append(number)
print(out)
Prints:
{
" 372": ["56887364", "56887364"],
" 333": ["59835647"],
"": ["56887364", "1234567", "11111111"],
}