Home > Software design >  Dealing with phone numbers in python with a space in the middle
Dealing with phone numbers in python with a space in the middle

Time:09-16

I have the following piece of code, however something odd is happening. When I encounter phone numbers with a space in the middle, the phone number turns into a 0. For Example: 07777 777777 will turn to 0.

import pandas as pd


 df['Phone_Number'] = df['Phone_Number'].str.replace(r'[^\w\s]= ', '')
    df['Phone_Number'] = pd.to_numeric(df['Phone_Number'], errors="coerce").fillna(0).astype('int64').to_frame()

CodePudding user response:

This is expected because you can't convert a string with space in the middle to integer directly. However, you can do the following:

This is my dataframe

>>> df
     name        toy       born            aa
1  Batman  Batmobile 1940-04-25  07777 777777

Now using apply function:

>>> df["aa"] = df.apply(lambda row: int(row["aa"].replace(" ", "")), axis=1)
>>> df
     name        toy       born          aa
1  Batman  Batmobile 1940-04-25  7777777777
>>> 

As you can see, you can get your phone numbers converted into integers. Also, if you want explicitly int64 then:

df["aa"] = df.apply(lambda row: np.int64(row["aa"].replace(" ", "")), axis=1)

CodePudding user response:

Pandas to_numeric set invalid parsing as NaN when using the coerce parameter. fillna then replace NaN values with 0.

In your situation, a space in the phone number will cause the parsing to fail and return NaN, which is then replaced with a 0.

I don't think you can use regexp with replace. You could directly pass the substring or the characters you want to filter out:

"07777 777777".replace(' ', '')

or if you do need to use regex:

import re
re.sub(r'\s ','',"07777 777777")

CodePudding user response:

You can use apply like below:

import pandas as pd

df = pd.DataFrame({
    'Phone_Number': ['07777 777777', ' 07777  777 777', ' 07 777 777  777   ']})

df['Phone_Number'] = df['Phone_Number'].apply(lambda x : x.replace(' ', ''))

# OR
# df['Phone_Number'] = df['Phone_Number'].apply(lambda x : ''.join(x.split()))

print(df)

Output:

Phone_Number
0   07777777777
1   07777777777
2   07777777777

CodePudding user response:

You can simplify the solution provided by @user1740577 as there is no need to use apply-function. Just use pandas str.replace() like this:

df['Phone_Number'] = df['Phone_Number'].str.replace(' ', '')
  • Related