i have written the function , that could take four input value and produce result based on that
def python_function(a, b, c, d):
if [a, b, c, d].count(0) == 4:
return "NA"
average = (a b c d) / (4 - [a, b, c, d].count(0))
# change to a for q1, b for q2, c for q3, d for q4
if c >= average:
if c > b:
return "G"
else:
return "S"
elif c < average:
return "B"
return "NA"
calling above function :
python_function(5.3,9.7,.4,0)
'B'
python_function(5.3,9.7,10.4,0)
'G
However when we are applying the same function for columns of pandas data frame , we are getting errors , i am sure there is a way to do that to handle the float value for logical operator but i am not sure how to do that
Data frame :
q1_profit q2_profit q3_profit q4_profit
0 89969.7 112896.7 25665.4 0
1 1.6 459.9 295.9 0
2 0.9 9.5 5.3 0
3 1396.1 1105.2 0.2 0
4 17.9 365.5 191.1 0
data_type:
q1_profit 1600 non-null float64
q2_profit 1600 non-null float64
q3_profit 1600 non-null float64
q4_profit 1600 non-null int64
data["rating"] = python_function(data["q1_profit"],data["q2_profit"],data["q3_profit"],data["q4_profit"])
error_messages
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-51-6dba2870dd9c> in <module>
----> 1 data["rating"] = python_function(data["q1_profit"],data["q2_profit"],data["q3_profit"],data["q4_profit"])
<ipython-input-39-47792387b172> in python_function(a, b, c, d)
1 def python_function(a, b, c, d):
----> 2 if [a, b, c, d].count(0) == 4:
3 return "NA"
4
5 average = (a b c d) / (4 - [a, b, c, d].count(0))
~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1476 raise ValueError("The truth value of a {0} is ambiguous. "
1477 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1478 .format(self.__class__.__name__))
1479
1480 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
CodePudding user response:
It looks like you're doing an operation on every row of the dataframe
. So I saw the best option to use the apply
function.
input_data = {
'q1_profit':[89969.7,1.6,0.9,1396.1 ,17.9 ],
'q2_profit':[112896.7, 459.9,9.5,1105.2 , 365.5],
'q3_profit' :[25665.4,295.9 ,5.3,0.2, 191.1],
'q4_profit':[0,0,0,0,0]
}
import pandas as pd
data = pd.DataFrame(data=input_data)
data['rating'] = data.apply(lambda row: python_function(row["q1_profit"],row["q2_profit"],row["q3_profit"],row["q4_profit"]), axis=1)
print(data)
output:
q1_profit q2_profit q3_profit q4_profit rating
0 89969.7 112896.7 25665.4 0 B
1 1.6 459.9 295.9 0 S
2 0.9 9.5 5.3 0 S
3 1396.1 1105.2 0.2 0 B
4 17.9 365.5 191.1 0 B
CodePudding user response:
data["rating"] = data.apply(lambda x : python_function(x.q1_profit,x.q2_profit,x.q3_profit,x.q4_profit),
axis =1)