Get the count and percentage by grouping values in Pandas-CodePudding

I have the following Dataframe in pandas,

Score   Risk
30      High Risk
50      Medium Risk
70      Medium Risk
40      Medium Risk
80      Low Risk
35      High Risk
65      Medium Risk
90      Low Risk

I want get total count, group by count and percentage the Risk column by its values, like this:

Expected output
Risk Category   Count   Percentage
High Risk       2       25.00
Medium Risk     4       50.00
Low Risk        2       25.00
Total           8       100.00

Can someone explain how can I achieve the expected output.

CodePudding user response：

You can use GroupBy.size with count percentages, join in concat, add total row and last if necessary convert index to column:

s = df.groupby('Risk')['Score'].size()
df = pd.concat([s, s / s.sum() * 100], axis=1, keys=('count','Percentage'))
df.loc['Total'] = df.sum().astype(int)
print (df)
             count  Percentage
Risk                          
High Risk        2        25.0
Low Risk         2        25.0
Medium Risk      4        50.0
Total            8       100.0


df = df.rename_axis('Risk Category').reset_index()
print (df)
  Risk Category  count  Percentage
0     High Risk      2        25.0
1      Low Risk      2        25.0
2   Medium Risk      4        50.0
3         Total      8       100.0

CodePudding user response：

You can also get a fairly clean answer using pivot_table since this can automatically create the margin totals for you.

summary = (
    df.pivot_table(
        index='Risk', aggfunc='count', margins='row', margins_name='Total'
    )
    .assign(Percentage=lambda df: df['Score'] / df.loc['Total', 'Score'] * 100)
    .rename_axis('Risk Category')
    .reset_index()
)

print(summary)
  Risk Category  Score  Percentage
0     High Risk      2        25.0
1      Low Risk      2        25.0
2   Medium Risk      4        50.0
3         Total      8       100.0