Percentages a single column's values in separate columns-CodePudding

For the following dataframe:

   person  choice
0  A       1
1  A       2
2  A       1
3  B       3
4  B       3
5  B       2
6  B       1
7  C       2

how can I find the percentage of each choice per person?

The output should be something like the following:

person  choice_1_count choice_2_count choice_3_count  total
A                    2              1              0      3 
B                    1              1              2      4
C                    0              1              0      1

to be used to find percentages:

person  choice_1_percent  choice_2_percent  choice_3_percent
A                  66.67             33.33              0.00
B                  25.00             25.00             50.00
C                   0.00            100.00              0.00

The format of the final dataframe matters, for example in sorting and plotting the percentage columns, and further analysis.

CodePudding user response：

df = pd.DataFrame(df.value_counts(['person', 'choice']).sort_index(), columns=["count"])
df["percent"] = df["count"]/df.groupby('person')['count'].transform('sum')

CodePudding user response：

Lets use crosstab to calculate frequency table and normalize across index axis to calculate percentages

dist = pd.crosstab(df['person'], df['choice'], normalize='index') * 100

Result

choice          1           2     3
person                             
A       66.666667   33.333333   0.0
B       25.000000   25.000000  50.0
C        0.000000  100.000000   0.0

Then you can plot the percentages

dist.plot(kind='bar')