I have the following piece of code, however something odd is happening. When I encounter phone numbers with a space in the middle, the phone number turns into a 0. For Example: 07777 777777 will turn to 0.
import pandas as pd
df['Phone_Number'] = df['Phone_Number'].str.replace(r'[^\w\s]= ', '')
df['Phone_Number'] = pd.to_numeric(df['Phone_Number'], errors="coerce").fillna(0).astype('int64').to_frame()
CodePudding user response:
This is expected because you can't convert a string with space in the middle to integer directly. However, you can do the following:
This is my dataframe
>>> df
name toy born aa
1 Batman Batmobile 1940-04-25 07777 777777
Now using apply function:
>>> df["aa"] = df.apply(lambda row: int(row["aa"].replace(" ", "")), axis=1)
>>> df
name toy born aa
1 Batman Batmobile 1940-04-25 7777777777
>>>
As you can see, you can get your phone numbers converted into integers. Also, if you want explicitly int64
then:
df["aa"] = df.apply(lambda row: np.int64(row["aa"].replace(" ", "")), axis=1)
CodePudding user response:
Pandas to_numeric set invalid parsing as NaN when using the coerce parameter. fillna
then replace NaN values with 0.
In your situation, a space in the phone number will cause the parsing to fail and return NaN, which is then replaced with a 0.
I don't think you can use regexp with replace. You could directly pass the substring or the characters you want to filter out:
"07777 777777".replace(' ', '')
or if you do need to use regex:
import re
re.sub(r'\s ','',"07777 777777")
CodePudding user response:
You can use apply
like below:
import pandas as pd
df = pd.DataFrame({
'Phone_Number': ['07777 777777', ' 07777 777 777', ' 07 777 777 777 ']})
df['Phone_Number'] = df['Phone_Number'].apply(lambda x : x.replace(' ', ''))
# OR
# df['Phone_Number'] = df['Phone_Number'].apply(lambda x : ''.join(x.split()))
print(df)
Output:
Phone_Number
0 07777777777
1 07777777777
2 07777777777
CodePudding user response:
You can simplify the solution provided by @user1740577 as there is no need to use apply-function. Just use pandas str.replace() like this:
df['Phone_Number'] = df['Phone_Number'].str.replace(' ', '')