I have a dataframe which looks like this
gender region
M A
M A
M A
F A
F A
F A
M B
M B
F B
F B
F B
F B
what I want to to is group by region and compute the share of gender for each group:
so I would like to obtain something like this:
region gender share
A M 0.5
A F 0.5
B M 0.33
B F 0.66
I can easily get the count per gender per region by:
df.groupby(['region', 'gender']).size().rename("count")
but I am not sure how to then get the share
CodePudding user response:
Use SeriesGroupBy.value_counts
with normalize=True
:
df = df.groupby(['region'])['gender'].value_counts(normalize=True).reset_index(name='share')
print (df)
region gender share
0 A F 0.500000
1 A M 0.500000
2 B F 0.666667
3 B M 0.333333