calculating and graphing rates per category in python-CodePudding

I am working on an employee attrition problem. I have columns for department and if the employee quit or not.

Since some departments have more employees working in them, I would assume that they account for the majority of attrition.

So I would like to graph the attrition rate per department. For example, out of the 1000 people that work in IT 200 quit (20% attrition rate).

What I have been doing is creating pivot tables and cross table to get the relevant information then calculating manually attrition rates. But I feel like this can be done more efficiently.

What I have currently is:

Sample DataFrame

df= pd.DataFrame({'department': ['sales','sales','HR','sales','sales','HR','sales','R&D','HR','sales'],
                   'quit': ['yes','no','no','yes','no','yes','yes','no','yes','no'],
                   'employee_count': [1,1,1,1,1,1,1,1,1,1]})

Pivot Table to find out how many employees are in each department


pivot = pd.pivot_table(df, values='employee_count', 
                    columns=['department'], aggfunc=np.sum)

which gives output of

.	HR	R&D	Sales
employee_count	3	1	6

Crosstab to find the number of employees that quit per department

pivot2=pd.crosstab(values=df['employee_count'],
                  index=df['department'],
                  columns=df['quit'],
                  aggfunc=np.sum)

Quit	no	yes
HR	1	2
R&D	1	nan
Sales	3	3

Calculate Manually and Plot

import matplotlib.pyplot as plt

plt.rcParams['axes.facecolor'] = 'white'
fig = plt.figure()
fig.patch.set_facecolor('white')

names = ['HR','sales']
values = [66.66666667,50]

plt.figure()
plt.bar(names, values)

any help would be appreciated, thanks for your time

CodePudding user response：

Pandas offers to group by certain columns. In your case we can aggregate the results based on the department and whether someone quit. Then use a pivot to change the format of the table and finally calculate the total number of employees and quit rate:

# obtain the relevant results by grouping
agg_df = df.groupby(by=['department', 'quit']).agg({'employee_count': 'sum'})
agg_df.reset_index(drop=False, inplace=True)

# pivot the table for further usage
pivot_df = agg_df.pivot(columns=['quit'], index=['department'], values='employee_count')
pivot_df.fillna(0, inplace=True)

# calculate final statistics
pivot_df['total'] = pivot_df['yes']   pivot_df['no']
pivot_df['quit_rate'] = pivot_df['yes'] / pivot_df['total']