I have a dataframe with a row for phone numbers. I wrote the following function to fill any NaNs with an empty string, and then add a ' ' and '1' to any phone numbers that needed them.
def fixCampaignerPhone(phone):
if phone.isnull():
phone = ''
phone = str(phone)
if len(phone) == 10:
phone = ('1' phone)
if len(phone) > 1:
phone = (' ' phone)
return phone
I tried to apply this function to a column of a dataframe as follows:
df['phone'] = df.apply(lambda row: fixCampaignerPhone(row['phone']), axis =1)
My function was not correctly identifying and replacing NaN values. Error "object of type 'float' has no len()" I worked around it with a .fillna() on a separate line, but I would like to understand why this didn't work. The function works if I manually pass a NaN value, so I assume it has to do with the fact that pandas is passing the argument as a float object, and not just a regular float.
EDIT: full working code with sample data for debugging.
import pandas as pd
import numpy as np
def fixCampaignerPhone(phone):# adds and 1 to front of phone numbers if necessary
if phone.isnull():
phone = ''
phone = str(phone)
if len(phone) == 10:
phone = ('1' phone)
if len(phone) > 1:
phone = (' ' phone)
return phone
d = {0: float("NaN"), 1:"2025676789"}
sampledata = pd.Series(data = d, index = [0 , 1])
sampledata.apply(lambda row: fixCampaignerPhone(row))
EDIT 2: changing phone.isnull() to pd.isna(phone) works for my sample data, but not for my production data set, so it must just be a weird quirk in my data somewhere. For context, the phone numbers in my production dataset must either be NaN, an 11 digit string starting with 1, or a 10 digit string. However, when I run my lambda function on my production dataset, I get the error "object of type 'float' has no len()" so somehow some floats/NaNs are slipping past my if statement
CodePudding user response:
From this imaginary DataFrame :
>>> import pandas as pd
>>> from io import StringIO
>>> df = pd.read_csv(StringIO("""
A,phone
L,3453454564
L,345345
R,345345
h,3
A,345345
L,345345
R,3453434543
R,345345
R,345345
R,345345
"""), sep=',')
>>> df
A phone
0 L 3453454564
1 L 345345
2 R 345345
3 h 3
4 A 345345
5 L 345345
6 R 3453434543
7 R 345345
8 R 345345
9 R 345345
We can use select
from numpy
to build our if
segment and get the expected result :
import numpy as np
df['phone'] = df['phone'].astype(str)
condlist = [df['phone'].str.len() == 10,
df['phone'].str.len() > 1]
choicelist = ['1' df['phone'],
' ' df['phone']]
df['phone'] = np.select(condlist, choicelist, default='')
Output :
A phone
0 L 13453454564
1 L 345345
2 R 345345
3 h
4 A 345345
5 L 345345
6 R 13453434543
7 R 345345
8 R 345345
9 R 345345
CodePudding user response:
Here is a working piece of code, you have to use pd.isnull(phone) instead of phone.isnull():
import pandas as pd
import numpy as np
def fixCampaignerPhone(phone):# adds and 1 to front of phone numbers if necessary
if pd.isnull(phone):
phone = ''
phone = str(phone)
if len(phone) == 10:
phone = ('1' phone)
if len(phone) > 1:
phone = (' ' phone)
return phone
d = {0: float("NaN"), 1:"2025676789"}
sampledata = pd.Series(data = d, index = [0 , 1])
r=sampledata.apply(lambda row: fixCampaignerPhone(row))
print(r)
result is:
0
1 12025676789
dtype: object