so i have data about salaries where the columns are:
Index(['Id', 'EmployeeName', 'JobTitle', 'BasePay', 'OvertimePay', 'OtherPay',
'Benefits', 'TotalPay', 'TotalPayBenefits', 'Year', 'Notes', 'Agency',
'Status'],
dtype='object')
where i created two extra columns to check whether their jobTitle has to do with police or fire fighter department
def find_police(x):
return "POLICE" in x
def find_fire(x):
return 'FIRE' in x
# use apply to search for it in JobTitle
sf_sal["isPolice"] = sf_sal["JobTitle"].apply(find_police)
sf_sal["isFire"] = sf_sal["JobTitle"].apply(find_fire)
sf_sal[["JobTitle", "isPolice", "isFire"]]
So what i would like to do is compare the mean salaries of the police department and firefigher department, i get the ratio of police to firefighters
ratio_of_police = sf_sal["isPolice"].sum()
ratio_of_fire = sf_sal["isFire"].sum()
by using the basesalary columns, i want to sum up all the rows that have true next to the ispolice column and the same with firefighters
One way i attempted this was
sf_mask = sf_sal['isPolice'] == True
all_police = sf_sal[sf_mask]
sum_of_base_salary = all_police['BasePay'].sum()
print(sum_of_base_salary/ratio_of_police)
sf_mask = sf_sal['isFire'] == True
all_fire = sf_sal[sf_mask]
sum_of_base_salary = all_fire['BasePay'].sum()
print(sum_of_base_salary/ratio_of_police)
By looking at the comments another way would've been to use groupby statements
CodePudding user response:
You can try:
is_police_or_fire = df['JobTitle'].str.extract(r'(FIRE|POLICE)', expand=False)
out = df.groupby(is_police_or_fire)['BasePay'].mean()
CodePudding user response:
You are looking for groupby operation.