Use lambda function for multiple replacement python-CodePudding

I have sample dataset df like;

tel_no
 tel: 1-860-752-8792
 tel: 1-949-722-8838

Th goal is to get the output as;

tel_no
18607528792
19497228838

Here is my attempt;

df['tel_no'].apply(lambda x: x.replace(i, '') for i in [' ','-','tel:'])

But this gives an error message;

TypeError: 'generator' object is not callable

I am aware that it can be done in 3 separate lines, on for each character. But I was wondering can we do it one line as above. Help is appreciated.

CodePudding user response：

An easier way to go would be to use pandas str methods, namely findall (to find all digits using the regex \d ) and join (to join the resulting list of digit substrings together):

>>> df.tel_no.str.findall("\d ").str.join("")

0    18607528792
1    19497228838
Name: tel_no, dtype: object

CodePudding user response：

I agree that using regex matching is a good solution to your problem, but I can at least address the problem with your code.

You current code is:

df['tel_no'].apply(lambda x: x.replace(i, '') for i in [' ','-','tel:'])

Python parses this (perhaps surprisingly) as:

df['tel_no'].apply(
    (
        (lambda x: x.replace(i, ''))
        for i in [' ','-','tel:'])
    )
)

That is, you have written a generator comprehension, creating a new anonymous function at each iteration of the loop. You have not created a single anonymous function with a generator comprehension inside it!

Obviously, generators are not callable, which is what caused the error.

Your attempt reflects two additional misunderstandings:

Comprehension syntax cannot be used outside of an actual comprehension. Perhaps you meant to write lambda x: (x.replace(i, '')) for i in [' ','-','tel:']), which would at least be one function that contains a generator comprehension.
String functions like str.replace do not modify the string. They return a new string. See the example below.

s1 = 'hello'
s2 = s1.replace('e', 'f')

# s1 will be unchanged
assert s1 == 'hello'

# s2 will be changed
assert s2 == 'hfllo'

To write this as a function, you would need to use def, not `lambda:

def clean_tel(x):
    for bad_string in [' ', '-', 'tel:']:
        x = x.replace(bad_string, '')
    return x

df['tel_no'].apply(clean_tel)

Or you can omit the loop and write it like this:

df['tel_no'].apply(
    lambda x: x.replace(' ', '').replace('-', '').replace('tel:', '')
)