I've tried to simplify my problem to the bear bones in the example below. I am attempting to apply
a function to a pandas data frame (much more complex than the one below) but the function contains an if statement that throws a Value Error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I handle passing a series to this lambda function without incurring this error?
def shot_test(make, att):
if att > 75:
return make / att
else:
return 0
f = lambda x: np.where(x.total > 30, shot_test(x.make, x.att), 0)
df['P'] = df.apply(f, axis=1)
CodePudding user response:
I used some made up data, but I believe this should get you what you are looking for IIUC.
def shot_test(make, att):
if att > 75:
return make / att
else:
return 0
trips = {'Column1':[0, 2, 19, 15, 0, 23, 0, 0, 10,0],
'Column2':[1, 2, 15, 1, 4, 22, 1, 0, 143,5],
'Column3':[2, 1, 54, 543, 34, 243, 7, 0, 213,5]}
df = pd.DataFrame(trips)
df['Lambda_Test'] = df.apply(lambda x : shot_test(x['Column2'], x['Column3']) if x['Column1'] >= 10 else 0, axis = 1)
df
This will allow you to pass multiple column arguments into the shot_test function as well as testing if a separate column meets a certain threshold.
CodePudding user response:
Try this:
df['P'][x.att>75]=x.make/x.att
df['P'][x.att<=75]=0
CodePudding user response:
Here are two example uses of your code, one which works and one which generates your error:
import pandas as pd
import numpy as np
def shot_test(make, att):
if att > 75:
return make / att
else:
return 0
f = lambda x: np.where(x.total > 30, shot_test(x.make, x.att), 0)
print("\nTest #1:")
df = pd.DataFrame({'total':[25,50,60], 'make':[300,500,1000], 'att':[100,100,50]})
print(df)
df['P'] = df.apply(f, axis=1)
print(df)
print("\nTest #2:")
df = pd.DataFrame({'total':[25,50,60], 'make':[300,500,1000], 'att':[pd.Series([100,50]),pd.Series([100,50]),pd.Series([50,100])]})
print(df)
df['P'] = df.apply(f, axis=1)
print(df)
Output:
Test #1:
total make att
0 25 300 100
1 50 500 100
2 60 1000 50
total make att P
0 25 300 100 0.0
1 50 500 100 5.0
2 60 1000 50 0
Test #2:
total make att
0 25 300 0 100
1 50
dtype: int64
1 50 500 0 100
1 50
dtype: int64
2 60 1000 0 50
1 100
dtype: int64
Traceback (most recent call last):
File "XXX.py", line 23, in <module>
df['P'] = df.apply(f, axis=1)
File "YYY\Python\Python310\lib\site-packages\pandas\core\frame.py", line 8833, in apply
return op.apply().__finalize__(self, method="apply")
File "YYY\Python\Python310\lib\site-packages\pandas\core\apply.py", line 727, in apply
return self.apply_standard()
File "YYY\Python\Python310\lib\site-packages\pandas\core\apply.py", line 851, in apply_standard
results, res_index = self.apply_series_generator()
File "YYY\Python\Python310\lib\site-packages\pandas\core\apply.py", line 867, in apply_series_generator
results[i] = self.f(v)
File "XXX.py", line 11, in <lambda>
f = lambda x: np.where(x.total > 30, shot_test(x.make, x.att), 0)
File "XXX.py", line 6, in shot_test
if att > 75:
File "YYY\Python\Python310\lib\site-packages\pandas\core\generic.py", line 1535, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
As you can see, in the second example, each value in column att
of the dataframe is itself a pandas Series, and this is triggering the error on this line:
if att > 75:
Assuming the data in the att
column can be easily transformed from Series to scalar, you can do this and modify the line of code above to be unambiguous. However, if att
is indeed supposed to be a Series (or other array-like structure) with multiple values, you may need to rethink the logic of your code.