Home > Blockchain >  How to Re-Write Lambda Function (via Pandas .apply() method) to Beat Famous "ValueError: The tr
How to Re-Write Lambda Function (via Pandas .apply() method) to Beat Famous "ValueError: The tr

Time:03-25

Edit: Solutions posted in this notebook. Special thanks to Étienne Célèry and ifly6!


I am trying to figure out how to beat the feared error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

d = {
    'nickname': ['bobg89', 'coolkid34','livelaughlove38'], 
    'state': ['NY', 'CA','TN'],
    'score': [100, 200,300]
}
df = pd.DataFrame(data=d)
df_2 = df.copy() #for use in the second non-lambda part
print(df)

And this outputs:

          nickname state  score
0           bobg89    NY    100
1        coolkid34    CA    200
2  livelaughlove38    TN    300

Then the goal is to add 50 to the score if they are from NY.

def add_some_love(state_value,score_value,name):
     if state_value == name:
          return score_value   50
     else:
          return score_value

Then we can apply that function with a lambda function.

df['love_added'] = df.apply(lambda x: add_some_love(x.state, x.score, 'NY'), axis=1)
print(df)

And that gets us:

          nickname state  score  love_added
0           bobg89    NY    100         150
1        coolkid34    CA    200         200
2  livelaughlove38    TN    300         300

And here is where I tried writing it, without the lambda, and that's where I get the error.

It seems like @MSeifert's answer here explains why this happens (that the function is looking at a whole column instead of a row in a column, but I also thought passing axis = 1 into the .apply() method would apply the function row-wise, and fix the problem).

So I then do this:

df2['love_added'] = df2.apply(add_some_love(df2.state, df2.score, 'NY'), axis=1)
print(df2)

And then you get the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So I've tried these solutions, but I can't seem to figure out how to rewrite the add_some_love() function so that it can run properly without the lambda function.

Does anyone have any advice?

Thanks so much for your time and consideration.

CodePudding user response:

What you could do instead is use np.where:

df['score'] = np.where(df['state'] == 'NY', df['score']   50, df['score'])

This would produce the same outcome as your applied function while also being much more performant.


The issue you have with the non-use of the lambda function is that you are not actually passing the rows to your function. What you're actually passing is the whole column df['score'], because that's what you told the computer to do.

What's going on in your function is the computer asking:

# if state_value == name ...
if df['score'] == 'NY':
    ...

Which naturally will raise your error, because df['score'] == 'NY' is a series of boolean variables and not a single boolean variable, as needed for the if statement.

CodePudding user response:

Your add_some_love function need a string input in order to execute the comparisson if state_value == name. When you apply a lambda function over a DataFrame, you can pass every cell to the function, instead of the whole Series. You can't use that exact add_some_love function without a lambda function.

If you still want to use apply(), try this function:

def add_some_love(row, name):
   if row.state == name:
       row.score = row.score   50
   return row

df_2 = df_2.apply(add_some_love, axis=1, args=('NY',))

However, this should be the fastest and most efficient:

FILTER = df_2['state'] == 'NY'
df_2.loc[FILTER, 'score'] = df_2.loc[FILTER, 'score']   50
  • Related