Making functional function in Pandas-CodePudding

I write my own function in Python. The function is very simple and below you can see data and function:

  data_1 = {'id':['1','2','3','4','5'],
            'name': ['Company1', 'Company1', 'Company3', 'Company4', 'Company5'], 
            'employee': [10, 3, 5, 1, 0], 
            'sales': [100, 30, 50, 200, 0], 
           }
    df = pd.DataFrame(data_1, columns = ['id','name', 'employee','sales'])
    
    threshold_1=40
    threshold_2=50

And the function is written below:

  def my_function(employee,sales):
        conditions = [
        (sales == 0 ),
        (sales < threshold_1), 
        (sales >= threshold_1 & employee <= threshold_2)]
        values = [0, sales*2, sales*4]
        sales_estimation = np.select(conditions, values)    
        return (sales_estimation)

df['new_column'] = df.apply(lambda x: my_function(x.employee,x.sales), axis=1)
df

So this function works well and gives the expected result.

Now I want to make the same function but with vectorized operation across Pandas Series. I need to have this function because vectorized operation decreases the time for executing. For this reason, I wrote this function but the function is not working.

  def my_function1(
        pandas_series:pd.Series
        )-> pd.Series:
        """
        Vectorized operation across Pandas Series
        """
        conditions = [
        (sales == 0 ),
        (sales < threshold_1), 
        (sales >= threshold_1 & employee <= threshold_2)]
        values = [0, sales*2, sales*4]
        sales_estimation = np.select(conditions, values)    
        return sales_estimation
    
    df['new_column_1']=my_function1(data['employee','sales'])

Probably my error is related to the input parameters of this function. So can anybody help me how to solve this problem and make my_function1 functional?

CodePudding user response：

You need to slightly change one condition to be able to pass Series:

(sales >= threshold_1 & employee <= threshold_2)
# equivalent to
# sales >= (threshold_1 & employee) <= threshold_2

into:

(sales >= threshold_1) & (employee <= threshold_2)

as the operator precedence was incorrect.

def my_function(employee,sales):
        conditions = [
        (sales == 0 ),
        (sales < threshold_1), 
        (sales >= threshold_1) & (employee <= threshold_2)]
        values = [0, sales*2, sales*4]
        sales_estimation = np.select(conditions, values)    
        return (sales_estimation)

df['new_column'] = my_function(df['employee'], df['sales'])

output:

  id      name  employee  sales  new_column
0  1  Company1        10    100         400
1  2  Company1         3     30          60
2  3  Company3         5     50         200
3  4  Company4         1    200         800
4  5  Company5         0      0           0

You can also pass the whole dataframe ans subset the columns there:

def my_function(df):
    employee = df['employee']
    sales = df['sales']
    conditions = [
    (sales == 0 ),
    (sales < threshold_1), 
    (sales >= threshold_1) & (employee <= threshold_2)]
    values = [0, sales*2, sales*4]
    sales_estimation = np.select(conditions, values)    
    return (sales_estimation)

df['new_column'] = my_function(df)

CodePudding user response：

Pass Series to function like and also add () for avoid ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). because priority of operators:

def my_function1(employee, sales):
      conditions = [
      (sales == 0 ),
      (sales < threshold_1), 
      (sales >= threshold_1) & (employee <= threshold_2)] #<- here
      values = [0, sales*2, sales*4]
      sales_estimation = np.select(conditions, values)    
      return sales_estimation
    
df['new_column_1']= my_function1(df['employee'],df['sales'])
print (df)
  id      name  employee  sales  new_column_1
0  1  Company1        10    100           400
1  2  Company1         3     30            60
2  3  Company3         5     50           200
3  4  Company4         1    200           800
4  5  Company5         0      0             0