Home > Enterprise >  Create dataframe using groupby of percentage with condition
Create dataframe using groupby of percentage with condition

Time:09-19

I want to create a dataframe in pandas to create a seaborn bar plot. My problem is I am not able to groupby the data with string column. If I use value_count() function it, work fine but in dataframe. Below is the detail explaination: -

My data is

Data_Hr = {"Age": [41,49,37,33,27], "Attrition": ["Yes", "No", "Yes", "No", "No"], "ageRange":[ 40-45 45-50 35-40 30-35 25-30]

Now I want to calculate % of Attrition with "Yes" value groupby "ageRange". Below is the function I am using but it is not converted into dataframe.

df[df.Attrition == 'Yes']['ageRange'].value_counts()/df['ageRange'].value_counts()*100

Or any other method to plot the graph of attrition%.

Thanks is advance

CodePudding user response:

Your code works fine. Please note that the last list is without '' and without commas. You should change it to:

Data_Hr = {"Age": [41,49,37,33,27], "Attrition": ["Yes", "No", "Yes", "No", "No"], "ageRange":['40-45', '45-50', '35-40', '30-35', '25-30']}

Then if you want to plot a graph from the results you should format the results as a dataframe:

results = pd.DataFrame(df[df.Attrition == 'Yes']['ageRange'].value_counts()/df['ageRange'].value_counts()*100)
results .dropna(subset=['ageRange'], inplace=True)

And to display using seaborn:

 sns.barplot(data = results, x = 'ageRange', y = results.index, color = 'Blue').set_title('Age Range Count of Attrition')
  • Related