Home > database >  Creating a function by using lambda and apply
Creating a function by using lambda and apply

Time:08-31

I want to create a function by using lambda and apply tools. In this function, it should return "1.30" or "0" according to the condition. If in the column value is greater than 30, it returns "1.30" not "0". After that, I have to implement this function to the elements in the column.

I've tried something but I could not. Let me show you what I did before:

def catch(column_name):
if column_name < 30:
    print("1.30") 
else:
    print("0")

df.loc[:,df.columns.str.contains("age")].apply(lambda catch: catch > 30)

When I type like that, just shows me True or False. I don't want this

CodePudding user response:

The purpose of lambda functions is that you define your function inline, so you don't need an external function definition. When you write catch inside the lambda definition, that becomes the name of a variable and is not referencing the function you defined above.

You have two alternative options. To use a lambda function, you can write:

df.loc[:,df.columns.str.contains("age")].apply(lambda x: 1.30 if x > 30 else 0)

No need for a function definition in this case. Alternatively, to use a standard function you can write:

def catch(x):
    if x > 30:
        return 1.30
    else:
        return 0

df.loc[:,df.columns.str.contains("age")].apply(catch)

I haven't error checked the df.loc part of your code because you didn't provide sample data or the context, but it looks a little off. The standard syntax would be df[column_name].apply(function). So for example if you want to see the result of applying the function to a column named 'age' you could write print(df['age'].apply(lambda x: 1.30 if x > 30 else 0)). Of course, the values in the column will need to be numeric.

CodePudding user response:

Anoher solution based on np.where to avoid ambiguous error of pandas when returning a list of True and False. Please see: https://www.learndatasci.com/solutions/python-valueerror-truth-value-series-ambiguous-use-empty-bool-item-any-or-all/

>>> import pandas as pd
>>> import numpy as np

>>> data = {"col1": [1,2,3,4,5], "age_1": [50,20,60,40,10], "age_2": [5,70,60,3,10]}
>>> df = pd.DataFrame(data)
>>> df

col1 age_1 age_2
1    50    5
2    20    70
3    60    60
4    40    3
5    10    10

>>> df[['age_1_p', "age_2_p"]] = np.where(df.iloc[:,df.columns.str.contains("age")]>30, 1.3, 0)
>>> df[['age_1_p', "age_2_p"]]

age_1_p age_2_p
1.3     0.0
0.0     1.3
1.3     1.3
1.3     0.0
0.0     0.0
  • Related