I have sample dataset df
like;
tel_no
tel: 1-860-752-8792
tel: 1-949-722-8838
Th goal is to get the output as;
tel_no
18607528792
19497228838
Here is my attempt;
df['tel_no'].apply(lambda x: x.replace(i, '') for i in [' ','-','tel:'])
But this gives an error message;
TypeError: 'generator' object is not callable
I am aware that it can be done in 3 separate lines, on for each character. But I was wondering can we do it one line as above. Help is appreciated.
CodePudding user response:
An easier way to go would be to use pandas str
methods, namely findall
(to find all digits using the regex \d
) and join
(to join the resulting list of digit substrings together):
>>> df.tel_no.str.findall("\d ").str.join("")
0 18607528792
1 19497228838
Name: tel_no, dtype: object
CodePudding user response:
I agree that using regex matching is a good solution to your problem, but I can at least address the problem with your code.
You current code is:
df['tel_no'].apply(lambda x: x.replace(i, '') for i in [' ','-','tel:'])
Python parses this (perhaps surprisingly) as:
df['tel_no'].apply(
(
(lambda x: x.replace(i, ''))
for i in [' ','-','tel:'])
)
)
That is, you have written a generator comprehension, creating a new anonymous function at each iteration of the loop. You have not created a single anonymous function with a generator comprehension inside it!
Obviously, generators are not callable, which is what caused the error.
Your attempt reflects two additional misunderstandings:
Comprehension syntax cannot be used outside of an actual comprehension. Perhaps you meant to write
lambda x: (x.replace(i, '')) for i in [' ','-','tel:'])
, which would at least be one function that contains a generator comprehension.String functions like
str.replace
do not modify the string. They return a new string. See the example below.
s1 = 'hello'
s2 = s1.replace('e', 'f')
# s1 will be unchanged
assert s1 == 'hello'
# s2 will be changed
assert s2 == 'hfllo'
To write this as a function, you would need to use def
, not `lambda:
def clean_tel(x):
for bad_string in [' ', '-', 'tel:']:
x = x.replace(bad_string, '')
return x
df['tel_no'].apply(clean_tel)
Or you can omit the loop and write it like this:
df['tel_no'].apply(
lambda x: x.replace(' ', '').replace('-', '').replace('tel:', '')
)