Home > front end >  Get percentage of occurrence of each value for certain columns
Get percentage of occurrence of each value for certain columns

Time:11-09

I have a df with many questions, each in a separate column. The rows are answers to those questions, which are on a scale of 1-5.:

q1  q2  q3  q4  q5
4   5   2   5   2
4   5   5   5   5
1   4   5   4   5
3   1   4   1   4
4   3   2   3   2
2   4   3   4   3

I would like to see, for each question, what percentage did each value get:

    1          2       3       4       5
q1  0.17    0.17    0.17    0.50    0.00
q2  0.17    0.00    0.17    0.33    0.33
q3  0.00    0.33    0.17    0.17    0.33
q4  0.17    0.00    0.17    0.33    0.33
q5  0.00    0.33    0.17    0.17    0.33

How do I do this? I've been trying with groupby and value counts, but no luck, e.g.:

df.value_counts(subset=['q1','q2','q3','q4','q5']).reset_index()

df.groupby(['q1','q2','q3','q4','q5'].size().unstack().apply(lambda x:x*100/x.sum()).T.reset_index()

Any help appreciated

CodePudding user response:

You can use:

df.apply(lambda s: s.value_counts(normalize=True)).T.fillna(0).round(2)

output:

       1     2     3     4     5
q1  0.17  0.17  0.17  0.50  0.00
q2  0.17  0.00  0.17  0.33  0.33
q3  0.00  0.33  0.17  0.17  0.33
q4  0.17  0.00  0.17  0.33  0.33
q5  0.00  0.33  0.17  0.17  0.33
  • Related