Edit: Solutions posted in this notebook. Special thanks to Étienne Célèry and ifly6!
I am trying to figure out how to beat the feared error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
d = {
'nickname': ['bobg89', 'coolkid34','livelaughlove38'],
'state': ['NY', 'CA','TN'],
'score': [100, 200,300]
}
df = pd.DataFrame(data=d)
df_2 = df.copy() #for use in the second non-lambda part
print(df)
And this outputs:
nickname state score
0 bobg89 NY 100
1 coolkid34 CA 200
2 livelaughlove38 TN 300
Then the goal is to add 50 to the score if they are from NY.
def add_some_love(state_value,score_value,name):
if state_value == name:
return score_value 50
else:
return score_value
Then we can apply that function with a lambda
function.
df['love_added'] = df.apply(lambda x: add_some_love(x.state, x.score, 'NY'), axis=1)
print(df)
And that gets us:
nickname state score love_added
0 bobg89 NY 100 150
1 coolkid34 CA 200 200
2 livelaughlove38 TN 300 300
And here is where I tried writing it, without the lambda, and that's where I get the error.
It seems like @MSeifert's answer here explains why this happens (that the function is looking at a whole column instead of a row in a column, but I also thought passing axis = 1
into the .apply()
method would apply the function row-wise, and fix the problem).
So I then do this:
df2['love_added'] = df2.apply(add_some_love(df2.state, df2.score, 'NY'), axis=1)
print(df2)
And then you get the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So I've tried these solutions, but I can't seem to figure out how to rewrite the add_some_love()
function so that it can run properly without the lambda function.
Does anyone have any advice?
Thanks so much for your time and consideration.
CodePudding user response:
What you could do instead is use np.where
:
df['score'] = np.where(df['state'] == 'NY', df['score'] 50, df['score'])
This would produce the same outcome as your applied function while also being much more performant.
The issue you have with the non-use of the lambda function is that you are not actually passing the rows to your function. What you're actually passing is the whole column df['score']
, because that's what you told the computer to do.
What's going on in your function is the computer asking:
# if state_value == name ...
if df['score'] == 'NY':
...
Which naturally will raise your error, because df['score'] == 'NY'
is a series of boolean variables and not a single boolean variable, as needed for the if
statement.
CodePudding user response:
Your add_some_love
function need a string input in order to execute the comparisson if state_value == name
. When you apply a lambda
function over a DataFrame, you can pass every cell to the function, instead of the whole Series.
You can't use that exact add_some_love
function without a lambda
function.
If you still want to use apply()
, try this function:
def add_some_love(row, name):
if row.state == name:
row.score = row.score 50
return row
df_2 = df_2.apply(add_some_love, axis=1, args=('NY',))
However, this should be the fastest and most efficient:
FILTER = df_2['state'] == 'NY'
df_2.loc[FILTER, 'score'] = df_2.loc[FILTER, 'score'] 50