I have a df with many questions, each in a separate column. The rows are answers to those questions, which are on a scale of 1-5.:
q1 q2 q3 q4 q5
4 5 2 5 2
4 5 5 5 5
1 4 5 4 5
3 1 4 1 4
4 3 2 3 2
2 4 3 4 3
I would like to see, for each question, what percentage did each value get:
1 2 3 4 5
q1 0.17 0.17 0.17 0.50 0.00
q2 0.17 0.00 0.17 0.33 0.33
q3 0.00 0.33 0.17 0.17 0.33
q4 0.17 0.00 0.17 0.33 0.33
q5 0.00 0.33 0.17 0.17 0.33
How do I do this? I've been trying with groupby and value counts, but no luck, e.g.:
df.value_counts(subset=['q1','q2','q3','q4','q5']).reset_index()
df.groupby(['q1','q2','q3','q4','q5'].size().unstack().apply(lambda x:x*100/x.sum()).T.reset_index()
Any help appreciated
CodePudding user response:
You can use:
df.apply(lambda s: s.value_counts(normalize=True)).T.fillna(0).round(2)
output:
1 2 3 4 5
q1 0.17 0.17 0.17 0.50 0.00
q2 0.17 0.00 0.17 0.33 0.33
q3 0.00 0.33 0.17 0.17 0.33
q4 0.17 0.00 0.17 0.33 0.33
q5 0.00 0.33 0.17 0.17 0.33