How to sum up a dataframe column based on another column value-CodePudding

so i have data about salaries where the columns are:

Index(['Id', 'EmployeeName', 'JobTitle', 'BasePay', 'OvertimePay', 'OtherPay',
   'Benefits', 'TotalPay', 'TotalPayBenefits', 'Year', 'Notes', 'Agency',
   'Status'],
  dtype='object')

where i created two extra columns to check whether their jobTitle has to do with police or fire fighter department

  def find_police(x):
    return "POLICE" in x
def find_fire(x):
    return 'FIRE' in x


# use apply to search for it in JobTitle
sf_sal["isPolice"] = sf_sal["JobTitle"].apply(find_police)
sf_sal["isFire"] = sf_sal["JobTitle"].apply(find_fire)

sf_sal[["JobTitle", "isPolice", "isFire"]]

So what i would like to do is compare the mean salaries of the police department and firefigher department, i get the ratio of police to firefighters

ratio_of_police = sf_sal["isPolice"].sum()
ratio_of_fire = sf_sal["isFire"].sum()

by using the basesalary columns, i want to sum up all the rows that have true next to the ispolice column and the same with firefighters

One way i attempted this was

sf_mask = sf_sal['isPolice'] == True
all_police = sf_sal[sf_mask]
sum_of_base_salary = all_police['BasePay'].sum()
print(sum_of_base_salary/ratio_of_police)
sf_mask = sf_sal['isFire'] == True
all_fire = sf_sal[sf_mask]
sum_of_base_salary = all_fire['BasePay'].sum()
print(sum_of_base_salary/ratio_of_police)

By looking at the comments another way would've been to use groupby statements

CodePudding user response：

You can try:

is_police_or_fire = df['JobTitle'].str.extract(r'(FIRE|POLICE)', expand=False)
out = df.groupby(is_police_or_fire)['BasePay'].mean()

CodePudding user response：

You are looking for groupby operation.