How to Re-Write Lambda Function (via Pandas .apply() method) to Beat Famous "ValueError: The tr-CodePudding

Edit: Solutions posted in this notebook. Special thanks to Étienne Célèry and ifly6!

I am trying to figure out how to beat the feared error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

d = {
    'nickname': ['bobg89', 'coolkid34','livelaughlove38'], 
    'state': ['NY', 'CA','TN'],
    'score': [100, 200,300]
}
df = pd.DataFrame(data=d)
df_2 = df.copy() #for use in the second non-lambda part
print(df)

And this outputs:

          nickname state  score
0           bobg89    NY    100
1        coolkid34    CA    200
2  livelaughlove38    TN    300

Then the goal is to add 50 to the score if they are from NY.

def add_some_love(state_value,score_value,name):
     if state_value == name:
          return score_value   50
     else:
          return score_value

Then we can apply that function with a lambda function.

df['love_added'] = df.apply(lambda x: add_some_love(x.state, x.score, 'NY'), axis=1)
print(df)

And that gets us:

          nickname state  score  love_added
0           bobg89    NY    100         150
1        coolkid34    CA    200         200
2  livelaughlove38    TN    300         300

And here is where I tried writing it, without the lambda, and that's where I get the error.

It seems like @MSeifert's answer here explains why this happens (that the function is looking at a whole column instead of a row in a column, but I also thought passing axis = 1 into the .apply() method would apply the function row-wise, and fix the problem).

So I then do this:

df2['love_added'] = df2.apply(add_some_love(df2.state, df2.score, 'NY'), axis=1)
print(df2)

And then you get the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So I've tried these solutions, but I can't seem to figure out how to rewrite the add_some_love() function so that it can run properly without the lambda function.

Does anyone have any advice?

Thanks so much for your time and consideration.

CodePudding user response：

What you could do instead is use np.where:

df['score'] = np.where(df['state'] == 'NY', df['score']   50, df['score'])

This would produce the same outcome as your applied function while also being much more performant.

The issue you have with the non-use of the lambda function is that you are not actually passing the rows to your function. What you're actually passing is the whole column df['score'], because that's what you told the computer to do.

What's going on in your function is the computer asking:

# if state_value == name ...
if df['score'] == 'NY':
    ...

Which naturally will raise your error, because df['score'] == 'NY' is a series of boolean variables and not a single boolean variable, as needed for the if statement.

CodePudding user response：

Your add_some_love function need a string input in order to execute the comparisson if state_value == name. When you apply a lambda function over a DataFrame, you can pass every cell to the function, instead of the whole Series. You can't use that exact add_some_love function without a lambda function.

If you still want to use apply(), try this function:

def add_some_love(row, name):
   if row.state == name:
       row.score = row.score   50
   return row

df_2 = df_2.apply(add_some_love, axis=1, args=('NY',))

However, this should be the fastest and most efficient:

FILTER = df_2['state'] == 'NY'
df_2.loc[FILTER, 'score'] = df_2.loc[FILTER, 'score']   50