Home > Back-end >  Get 'Unique Count' and 'Percentage' after Group By
Get 'Unique Count' and 'Percentage' after Group By

Time:10-15

I have the following dataframe of students with their grade:

df = pd.DataFrame({'student': 'AAA AAA BBB BBB BBB CCC DDD DDD DDD DDD'.split(),
                   'grade': ['B', 'B', 'C', 'A', 'A', 'B', 'B', 'A', 'B', 'B']})

print(df)

    student grade
0     AAA     B
1     AAA     B
2     BBB     C
3     BBB     A
4     BBB     A
5     CCC     B
6     DDD     B
7     DDD     A
8     DDD     B
9     DDD     B

I have to count unique student for each grade along with percentage. I'm trying the following script to get the unique student count:

df.groupby('grade')['student'].nunique()

grade
  A    2
  B    3
  C    1

However, I'm having troubles to get the percentage of students out of total students (we have 4 students in total):

Desired Output:

grade student_count percentage
 A         2           50.00
 B         3           75.00
 C         1           25.00

What would be the ideal way of getting the desired output? Any suggestions would be appreciated. Thanks!

CodePudding user response:

Just divide into the total number of students

df.groupby('grade')['student'].nunique() / df['student'].nunique()

CodePudding user response:

To produce your desired output:

df1 = pd.pivot_table(
    df,
    index='grade',
    values='student',
    aggfunc='nunique'
    )

df1['percentage'] = (df1['student']/df['student'].nunique())*100
df1.reset_index()
  • Related