Home > database >  group by two columns, and compute percentage at group level in pandas
group by two columns, and compute percentage at group level in pandas

Time:06-09

I have a dataframe which looks like this

gender region  
    M     A    
    M     A    
    M     A    
    F     A    
    F     A    
    F     A    
    M     B    
    M     B    
    F     B    
    F     B    
    F     B    
    F     B    

what I want to to is group by region and compute the share of gender for each group:

so I would like to obtain something like this:

region    gender    share
A           M        0.5
A           F        0.5
B           M        0.33
B           F        0.66

I can easily get the count per gender per region by:

df.groupby(['region', 'gender']).size().rename("count") 

but I am not sure how to then get the share

CodePudding user response:

Use SeriesGroupBy.value_counts with normalize=True:

df = df.groupby(['region'])['gender'].value_counts(normalize=True).reset_index(name='share')
print (df)
  region gender     share
0      A      F  0.500000
1      A      M  0.500000
2      B      F  0.666667
3      B      M  0.333333
  • Related