I have the following Dataframe in pandas,
Score Risk
30 High Risk
50 Medium Risk
70 Medium Risk
40 Medium Risk
80 Low Risk
35 High Risk
65 Medium Risk
90 Low Risk
I want get total count, group by count and percentage the Risk column by its values, like this:
Expected output
Risk Category Count Percentage
High Risk 2 25.00
Medium Risk 4 50.00
Low Risk 2 25.00
Total 8 100.00
Can someone explain how can I achieve the expected output.
CodePudding user response:
You can use GroupBy.size
with count percentages, join in concat
, add total
row and last if necessary convert index to column:
s = df.groupby('Risk')['Score'].size()
df = pd.concat([s, s / s.sum() * 100], axis=1, keys=('count','Percentage'))
df.loc['Total'] = df.sum().astype(int)
print (df)
count Percentage
Risk
High Risk 2 25.0
Low Risk 2 25.0
Medium Risk 4 50.0
Total 8 100.0
df = df.rename_axis('Risk Category').reset_index()
print (df)
Risk Category count Percentage
0 High Risk 2 25.0
1 Low Risk 2 25.0
2 Medium Risk 4 50.0
3 Total 8 100.0
CodePudding user response:
You can also get a fairly clean answer using pivot_table
since this can automatically create the margin totals for you.
summary = (
df.pivot_table(
index='Risk', aggfunc='count', margins='row', margins_name='Total'
)
.assign(Percentage=lambda df: df['Score'] / df.loc['Total', 'Score'] * 100)
.rename_axis('Risk Category')
.reset_index()
)
print(summary)
Risk Category Score Percentage
0 High Risk 2 25.0
1 Low Risk 2 25.0
2 Medium Risk 4 50.0
3 Total 8 100.0