Home > Blockchain >  how to use and apply function with pandas data frame in python
how to use and apply function with pandas data frame in python

Time:11-18

i have written the function , that could take four input value and produce result based on that

def python_function(a, b, c, d):
    if [a, b, c, d].count(0) == 4:
        return "NA"

    average = (a   b   c   d) / (4 - [a, b, c, d].count(0))

    # change to a for q1, b for q2, c for q3, d for q4
    if c >= average:
        if c > b:
            return "G"
        else:
            return "S"
    elif c < average:
        return "B"

    return "NA"

calling above function :

python_function(5.3,9.7,.4,0)

'B'

python_function(5.3,9.7,10.4,0)

'G

However when we are applying the same function for columns of pandas data frame , we are getting errors , i am sure there is a way to do that to handle the float value for logical operator but i am not sure how to do that

Data frame :

   q1_profit    q2_profit   q3_profit   q4_profit
0   89969.7     112896.7    25665.4     0
1   1.6         459.9       295.9       0
2   0.9         9.5         5.3         0
3   1396.1      1105.2      0.2         0
4   17.9        365.5       191.1       0

data_type:

q1_profit            1600 non-null float64
q2_profit            1600 non-null float64
q3_profit            1600 non-null float64
q4_profit            1600 non-null int64




 data["rating"] = python_function(data["q1_profit"],data["q2_profit"],data["q3_profit"],data["q4_profit"])

error_messages

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-51-6dba2870dd9c> in <module>
----> 1 data["rating"] = python_function(data["q1_profit"],data["q2_profit"],data["q3_profit"],data["q4_profit"])

<ipython-input-39-47792387b172> in python_function(a, b, c, d)
      1 def python_function(a, b, c, d):
----> 2     if [a, b, c, d].count(0) == 4:
      3         return "NA"
      4 
      5     average = (a   b   c   d) / (4 - [a, b, c, d].count(0))

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1476         raise ValueError("The truth value of a {0} is ambiguous. "
   1477                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1478                          .format(self.__class__.__name__))
   1479 
   1480     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

CodePudding user response:

It looks like you're doing an operation on every row of the dataframe. So I saw the best option to use the apply function.

input_data = {
     'q1_profit':[89969.7,1.6,0.9,1396.1 ,17.9 ],
     'q2_profit':[112896.7, 459.9,9.5,1105.2 , 365.5],
     'q3_profit' :[25665.4,295.9 ,5.3,0.2, 191.1],
     'q4_profit':[0,0,0,0,0]
      }

import pandas as pd 
data = pd.DataFrame(data=input_data)
 
data['rating'] = data.apply(lambda row: python_function(row["q1_profit"],row["q2_profit"],row["q3_profit"],row["q4_profit"]), axis=1)

print(data)

output:

   q1_profit  q2_profit  q3_profit  q4_profit rating
0    89969.7   112896.7    25665.4          0      B
1        1.6      459.9      295.9          0      S
2        0.9        9.5        5.3          0      S
3     1396.1     1105.2        0.2          0      B
4       17.9      365.5      191.1          0      B

CodePudding user response:

data["rating"]  = data.apply(lambda x : python_function(x.q1_profit,x.q2_profit,x.q3_profit,x.q4_profit),
         axis =1)
  • Related