Home > OS >  calculating and graphing rates per category in python
calculating and graphing rates per category in python

Time:04-09

I am working on an employee attrition problem. I have columns for department and if the employee quit or not.

Since some departments have more employees working in them, I would assume that they account for the majority of attrition.

So I would like to graph the attrition rate per department. For example, out of the 1000 people that work in IT 200 quit (20% attrition rate).

What I have been doing is creating pivot tables and cross table to get the relevant information then calculating manually attrition rates. But I feel like this can be done more efficiently.

What I have currently is:

  1. Sample DataFrame
df= pd.DataFrame({'department': ['sales','sales','HR','sales','sales','HR','sales','R&D','HR','sales'],
                   'quit': ['yes','no','no','yes','no','yes','yes','no','yes','no'],
                   'employee_count': [1,1,1,1,1,1,1,1,1,1]})
  1. Pivot Table to find out how many employees are in each department

pivot = pd.pivot_table(df, values='employee_count', 
                    columns=['department'], aggfunc=np.sum)

which gives output of

. HR R&D Sales
employee_count 3 1 6
  1. Crosstab to find the number of employees that quit per department
pivot2=pd.crosstab(values=df['employee_count'],
                  index=df['department'],
                  columns=df['quit'],
                  aggfunc=np.sum)
Quit no yes
HR 1 2
R&D 1 nan
Sales 3 3
  1. Calculate Manually and Plot
import matplotlib.pyplot as plt

plt.rcParams['axes.facecolor'] = 'white'
fig = plt.figure()
fig.patch.set_facecolor('white')

names = ['HR','sales']
values = [66.66666667,50]

plt.figure()
plt.bar(names, values)

any help would be appreciated, thanks for your time

CodePudding user response:

Pandas offers to group by certain columns. In your case we can aggregate the results based on the department and whether someone quit. Then use a pivot to change the format of the table and finally calculate the total number of employees and quit rate:

# obtain the relevant results by grouping
agg_df = df.groupby(by=['department', 'quit']).agg({'employee_count': 'sum'})
agg_df.reset_index(drop=False, inplace=True)

# pivot the table for further usage
pivot_df = agg_df.pivot(columns=['quit'], index=['department'], values='employee_count')
pivot_df.fillna(0, inplace=True)

# calculate final statistics
pivot_df['total'] = pivot_df['yes']   pivot_df['no']
pivot_df['quit_rate'] = pivot_df['yes'] / pivot_df['total']
  • Related