I am trying to apply a function to a pandas series that checks the first 3 as well as the first 2 characters of the values in the series.
If they match either, the first 3 or 2 characters (depending on which matched) needs to be replaced with '0', the rest of the characters remain the same.
The original dtype was of type 'O', I have tried converting this to type 'string' but still can't get this to work.
Sample data looks like so:
012xxxxxxx
27xxxxxxxx
011xxxxxxx
27xxxxxxxx
etc...
The condition I am evaluating is if the first 3 characters == ' 27' replace ' 27' with '0' or if the first 2 characters == '27' replace '27' with '0'
I have the following apply method but the values aren't being updated.
def normalize_number(num):
if num[:3] == ' 27':
# num.str.replace(num[:3], '0') ## First Method
return '0' num[4:] ## Second Method
else:
return num
if num[:2] == '27':
# num.str.replace(num[:2], '0')
return '0' num[3:]
else:
return num
df['number'].apply(normalize_number)
What am I missing here?
CodePudding user response:
It looks like you should use a regex here. The the string starts with 27
with an optional
in front, replace with 0
:
df['number2'] = df['number'].str.replace('^\ ?27', '0', regex=True)
output:
number number2
0 012xxxxxxx 012xxxxxxx
1 27xxxxxxxx 0xxxxxxxx
2 011xxxxxxx 011xxxxxxx
3 27xxxxxxxx 0xxxxxxxx
why your approach failed
your approach failed because your returned too early with an else
statement. You should have used:
def normalize_number(num):
if num[:3] == ' 27':
return '0' num[4:] ## Second Method
elif num[:2] == '27':
return '0' num[3:]
else:
return num
NB. Use the regex approach above, it will be much more efficient
regex
^ # match start of string
\ # match literal
? # make previous match (the " ") optional
27 # match literal 27