For the following dataframe:
person choice
0 A 1
1 A 2
2 A 1
3 B 3
4 B 3
5 B 2
6 B 1
7 C 2
how can I find the percentage of each choice per person?
The output should be something like the following:
person choice_1_count choice_2_count choice_3_count total
A 2 1 0 3
B 1 1 2 4
C 0 1 0 1
to be used to find percentages:
person choice_1_percent choice_2_percent choice_3_percent
A 66.67 33.33 0.00
B 25.00 25.00 50.00
C 0.00 100.00 0.00
The format of the final dataframe matters, for example in sorting and plotting the percentage columns, and further analysis.
CodePudding user response:
df = pd.DataFrame(df.value_counts(['person', 'choice']).sort_index(), columns=["count"])
df["percent"] = df["count"]/df.groupby('person')['count'].transform('sum')
CodePudding user response:
Lets use crosstab
to calculate frequency table and normalize
across index
axis to calculate percentages
dist = pd.crosstab(df['person'], df['choice'], normalize='index') * 100
Result
choice 1 2 3
person
A 66.666667 33.333333 0.0
B 25.000000 25.000000 50.0
C 0.000000 100.000000 0.0
Then you can plot the percentages
dist.plot(kind='bar')