I write my own function in Python. The function is very simple and below you can see data and function:
data_1 = {'id':['1','2','3','4','5'],
'name': ['Company1', 'Company1', 'Company3', 'Company4', 'Company5'],
'employee': [10, 3, 5, 1, 0],
'sales': [100, 30, 50, 200, 0],
}
df = pd.DataFrame(data_1, columns = ['id','name', 'employee','sales'])
threshold_1=40
threshold_2=50
And the function is written below:
def my_function(employee,sales):
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1 & employee <= threshold_2)]
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return (sales_estimation)
df['new_column'] = df.apply(lambda x: my_function(x.employee,x.sales), axis=1)
df
So this function works well and gives the expected result.
Now I want to make the same function but with vectorized operation across Pandas Series. I need to have this function because vectorized operation decreases the time for executing. For this reason, I wrote this function but the function is not working.
def my_function1(
pandas_series:pd.Series
)-> pd.Series:
"""
Vectorized operation across Pandas Series
"""
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1 & employee <= threshold_2)]
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return sales_estimation
df['new_column_1']=my_function1(data['employee','sales'])
Probably my error is related to the input parameters of this function. So can anybody help me how to solve this problem and make my_function1 functional?
CodePudding user response:
You need to slightly change one condition to be able to pass Series:
(sales >= threshold_1 & employee <= threshold_2)
# equivalent to
# sales >= (threshold_1 & employee) <= threshold_2
into:
(sales >= threshold_1) & (employee <= threshold_2)
as the operator precedence was incorrect.
def my_function(employee,sales):
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1) & (employee <= threshold_2)]
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return (sales_estimation)
df['new_column'] = my_function(df['employee'], df['sales'])
output:
id name employee sales new_column
0 1 Company1 10 100 400
1 2 Company1 3 30 60
2 3 Company3 5 50 200
3 4 Company4 1 200 800
4 5 Company5 0 0 0
You can also pass the whole dataframe ans subset the columns there:
def my_function(df):
employee = df['employee']
sales = df['sales']
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1) & (employee <= threshold_2)]
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return (sales_estimation)
df['new_column'] = my_function(df)
CodePudding user response:
Pass Series to function like and also add ()
for avoid ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
because priority of operators:
def my_function1(employee, sales):
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1) & (employee <= threshold_2)] #<- here
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return sales_estimation
df['new_column_1']= my_function1(df['employee'],df['sales'])
print (df)
id name employee sales new_column_1
0 1 Company1 10 100 400
1 2 Company1 3 30 60
2 3 Company3 5 50 200
3 4 Company4 1 200 800
4 5 Company5 0 0 0