Home > Back-end >  Use lambda function for multiple replacement python
Use lambda function for multiple replacement python

Time:09-23

I have sample dataset df like;

tel_no
 tel: 1-860-752-8792
 tel: 1-949-722-8838

Th goal is to get the output as;

tel_no
18607528792
19497228838

Here is my attempt;

df['tel_no'].apply(lambda x: x.replace(i, '') for i in [' ','-','tel:'])

But this gives an error message;

TypeError: 'generator' object is not callable

I am aware that it can be done in 3 separate lines, on for each character. But I was wondering can we do it one line as above. Help is appreciated.

CodePudding user response:

An easier way to go would be to use pandas str methods, namely findall (to find all digits using the regex \d ) and join (to join the resulting list of digit substrings together):

>>> df.tel_no.str.findall("\d ").str.join("")

0    18607528792
1    19497228838
Name: tel_no, dtype: object

CodePudding user response:

I agree that using regex matching is a good solution to your problem, but I can at least address the problem with your code.

You current code is:

df['tel_no'].apply(lambda x: x.replace(i, '') for i in [' ','-','tel:'])

Python parses this (perhaps surprisingly) as:

df['tel_no'].apply(
    (
        (lambda x: x.replace(i, ''))
        for i in [' ','-','tel:'])
    )
)

That is, you have written a generator comprehension, creating a new anonymous function at each iteration of the loop. You have not created a single anonymous function with a generator comprehension inside it!

Obviously, generators are not callable, which is what caused the error.

Your attempt reflects two additional misunderstandings:

  1. Comprehension syntax cannot be used outside of an actual comprehension. Perhaps you meant to write lambda x: (x.replace(i, '')) for i in [' ','-','tel:']), which would at least be one function that contains a generator comprehension.

  2. String functions like str.replace do not modify the string. They return a new string. See the example below.

s1 = 'hello'
s2 = s1.replace('e', 'f')

# s1 will be unchanged
assert s1 == 'hello'

# s2 will be changed
assert s2 == 'hfllo'

To write this as a function, you would need to use def, not `lambda:

def clean_tel(x):
    for bad_string in [' ', '-', 'tel:']:
        x = x.replace(bad_string, '')
    return x

df['tel_no'].apply(clean_tel)

Or you can omit the loop and write it like this:

df['tel_no'].apply(
    lambda x: x.replace(' ', '').replace('-', '').replace('tel:', '')
)
  • Related