df.loc() doesn't seem to be working or filtering out the rows I need-CodePudding

I'm trying to figure out how to use df.loc to calculate the number of orders per courier that are eligible for a bonus pay.

df['Eligible'] = df.loc[(df['DeliveryOnTime'] == "On-time") & (df['DeliveryOnTime'] == "Early"), 'Total Orders'].sum()

So what I want to do is to create a new column called Eligible using data from DeliveryOnTime with rows that have On-time and Early to perform a total count.

This information will be used to calculate the bonus pay:

df['Success Rate'] = df['Eligible']/df['Total Orders']

df['Bonus'] = df['Eligible'].mul(1.2).where(df['Success Rate'] >= 0.95)

It seems that the Eligible code isn't working as the number of Total Orders and Eligible are exactly the same for every ID when there should be differences as some deliveries will be marked as Late in DeliveryOnTime

Sample Data

df = {'ID': [1, 1, 1, 2, 2, 3, 4, 5, 5],
         'DeliveryOnTime': ["On-time", "Late", "Early", "On-time", "On-time", "Late", "Early", "Early"],
      }

df = pd.DataFrame(data)

#For the sake of example data, the count of `DeliveryOnTime` will be the total number of orders. 
df['Total Orders'] = df['DeliveryOnTime'].count()

So in this example: ID 1's success rate would be 66.66 with 2 out of 3 orders being eligible for bonus pay.

Only ID's 2 and 5 would qualify as their success rate would be over 95.

CodePudding user response：

Will this helps?

data = {'ID': [1, 1, 1, 2, 2, 3, 4, 5, 5],
         'DeliveryOnTime': ["On-time", "Late", "Early", "On-time", "On-time", "Late", "Late", "Early", "Early"],
      }
data = pd.DataFrame(data)

data['Eligible']=data['DeliveryOnTime'].apply(lambda x: x in ['On-time', 'Early'])
g = data.groupby(['ID'])
success_rate_by_ID = g['Eligible'].apply(lambda x: sum(x)/len(x))
print(success_rate_by_ID)

CodePudding user response：

use the dataframe transform to create an eligible column value then sum the column. Group by courier id and filter where eligible equals 1 and count the index