I'm trying to figure out how to use df.loc
to calculate the number of orders per courier that are eligible for a bonus pay.
df['Eligible'] = df.loc[(df['DeliveryOnTime'] == "On-time") & (df['DeliveryOnTime'] == "Early"), 'Total Orders'].sum()
So what I want to do is to create a new column called Eligible
using data from DeliveryOnTime
with rows that have On-time
and Early
to perform a total count.
This information will be used to calculate the bonus pay:
df['Success Rate'] = df['Eligible']/df['Total Orders']
df['Bonus'] = df['Eligible'].mul(1.2).where(df['Success Rate'] >= 0.95)
It seems that the Eligible
code isn't working as the number of Total Orders
and Eligible
are exactly the same for every ID when there should be differences as some deliveries will be marked as Late
in DeliveryOnTime
Sample Data
df = {'ID': [1, 1, 1, 2, 2, 3, 4, 5, 5],
'DeliveryOnTime': ["On-time", "Late", "Early", "On-time", "On-time", "Late", "Early", "Early"],
}
df = pd.DataFrame(data)
#For the sake of example data, the count of `DeliveryOnTime` will be the total number of orders.
df['Total Orders'] = df['DeliveryOnTime'].count()
So in this example: ID 1's success rate would be 66.66 with 2 out of 3 orders being eligible for bonus pay.
Only ID's 2 and 5 would qualify as their success rate would be over 95.
CodePudding user response:
Will this helps?
data = {'ID': [1, 1, 1, 2, 2, 3, 4, 5, 5],
'DeliveryOnTime': ["On-time", "Late", "Early", "On-time", "On-time", "Late", "Late", "Early", "Early"],
}
data = pd.DataFrame(data)
data['Eligible']=data['DeliveryOnTime'].apply(lambda x: x in ['On-time', 'Early'])
g = data.groupby(['ID'])
success_rate_by_ID = g['Eligible'].apply(lambda x: sum(x)/len(x))
print(success_rate_by_ID)
CodePudding user response:
use the dataframe transform to create an eligible column value then sum the column. Group by courier id and filter where eligible equals 1 and count the index