Condition is ignored using .where() in Pandas-CodePudding

I'm trying to calculate the Bonus Pay given to our Courier's based on their Success Rate which is based off the number of Eligible orders they've successfully delivered.

Here's my code:

from openpyxl import load_workbook
import pandas as pd

df = pd.read_excel(r'path\filename.xlsx')


df['DeliveredAt'] = pd.to_datetime(df['DeliveredAt'].astype(str))

df['Date'] = df['DeliveredAt'].dt.strftime('%d/%m/%y')

df['Total Orders'] = df['OrderNumber']

df['Total Orders'] = 1

df['Eligible'] = df['DeliveryOnTime'].isin(["On-time", "Early"])

df['Success Rate'] = (df['Eligible']/df['Total Orders'])*100

df['Bonus'] = df['Eligible'].mul(1.2).where(df['Success Rate'] >= 95)

per_day = df.groupby(['Date', 'Courier']).agg({'Success Rate': 'mean', 'Total Orders': 'count', 'Eligible': 'sum', 'Incentive': 'sum'})

per_courier = df.groupby('Courier').agg({'Total Orders': 'count', 'Incentive': 'sum'})

I'm having issues with

df['Bonus'] = df['Eligible'].mul(1.2).where(df['Success Rate'] >= 95)

It seems to ignore the condition that Success Rate has to be over 95 for it to qualify for a Bonus. Right now, each row has a Bonus pay which is calculated from the number of Eligible * 1.2 regardless of the success rate value.

Sample Data:

data = {'ID': [1, 1, 1, 2, 2, 3, 4, 5, 5],
         'DeliveryOnTime': ["On-time", "Late", "Early", "On-time", "On-time", "Late", "Early", "Early", "Late"],
      }

df = pd.DataFrame(data)

#For the sake of example data, the count of `DeliveryOnTime` will be the total number of orders. 
df['Total Orders'] = df['DeliveryOnTime'].count() 
df['Eligible'] = df['DeliveryOnTime'].isin(["On-time", "Early"])
df['Success Rate'] = (df['Eligible']/df['Total Orders'])*100
df['Bonus'] = df['Eligible'].mul(1.2).where(df['Success Rate'] >= 95)

Current Output

    ID       Eligible      Total Orders    Success Rate   Bonus
     1           2              3            66.66         2.4
     2           2              2             100          2.4
     3           0              1              0            0
     4           1              1             100          1.2
     5           1              2              50          1.2

This is wrong as ID 1 and 5 should not have any Bonus at all with the Success Rate being lower than 95.

Expected Output

ID       Eligible     Total Orders    Success Rate    Bonus
 1           2              3            66.66          0
 2           2              2             100          2.4
 3           0              1              0            0
 4           1              1             100          1.2
 5           1              2              50           0

I'm not sure if it's something I have to change in the Per_day or Per_courier section of the .groupby() as that seems to be where most of my issues have occurred when writing this code.

Thanks

CodePudding user response：

As per what I understood, you need to find Bonus based on the success rate of the delivery boy. For that first you need to filter out with the eligible one.

Considering your own example

data = {'ID': [1, 1, 1, 2, 2, 3, 4, 5, 5],
         'DeliveryOnTime': ["On-time", "Late", "Early", "On-time", "On-time", "Late", "Early", "Early", "Late"],
      }

df = pd.DataFrame(data)

df['Eligible'] = df['DeliveryOnTime'].isin(["On-time", "Early"])
df['Total Orders'] = df['DeliveryOnTime'].count() 

df.replace([True,False], [1,0],inplace=True)
per_courier_eligible = df.groupby('ID').agg({'Eligible': 'sum','Total Orders':'count'})
per_courier_eligible['Success Rate'] = per_courier_eligible['Eligible']/per_courier_eligible['Total Orders']
per_courier_eligible['Bonus'] = per_courier_eligible['Eligible']
per_courier_eligible['Bonus'].loc[(per_courier_eligible['Success Rate'] >= 0.95)] = per_courier_eligible['Eligible']   1.2
print(per_courier_eligible)

CodePudding user response：

Is the problem with the order of .mul and .where?

This worked when I tried to reproduce:

df['Bonus'] = df['Eligible'].where(df['Success Rate'] >= 95).mul(1.2)

CodePudding user response：

I used an element of what @Rupal Shah said by filtering it per ID for the Success Rate and Bonus pay to work.

All I had to do was create a .groupby():

ID = df.groupby(['Urber', 'Date']).agg({'Total Orders':'sum','Eligible': 'sum'})

After this, I changed any following syntaxes with calculations from df. to ID.e.g.

ID['Success Rate'] = (ID['Eligible']/ID['Total Orders'])*100