Home > database >  How to get a group by with aggregations considering the value of the columns of a dataframe
How to get a group by with aggregations considering the value of the columns of a dataframe

Time:02-04

i have a pandas dataframe like this:

id    gender  column_1  column_2  column_3  column_n
10      male     a          b         a           b
10      female   b          c         b           c
10      male     c          c         a           a
10      male     b          a         a           b

I want to get this as output:

id    column_name     male_%_a  male_%_b   male_%_c   female_%_a  female_%_b   female_%_c
10    column_1           33.3       33.3       33.3       0            100        0
10    column_2           33.3       33.3       33.3       0             0        100
10    column_3           100        0           0         0            100        0
10    column_n           33.3      66.7         0         0             0        100

The dataframe can have any number of column_1 to column_n and will always have a column idand gender

I tried to groupby like this: df.gropby('gender') but i am having a hard time to get the calculations to get the % of each label a,b,c

CodePudding user response:

You can melt, crosstab and unstack:

df2 = df.melt(['id', 'gender'], var_name='columns_name')

out = (
 pd.crosstab([df2['id'], df2['gender'], df2['columns_name']], df2['value'])
   .pipe(lambda d: d.div(d.sum(axis=1), axis=0).mul(100).round(2))
   .unstack('gender')
)

out.columns = out.columns.map(lambda x: f'{x[1]}_%_{x[0]}')

out = out.reset_index()

Output:

   id columns_name  female_%_a  male_%_a  female_%_b  male_%_b  female_%_c  male_%_c
0  10     column_1         0.0     33.33       100.0     33.33         0.0     33.33
1  10     column_2         0.0     33.33         0.0     33.33       100.0     33.33
2  10     column_3         0.0    100.00       100.0      0.00         0.0      0.00
3  10     column_n         0.0     33.33         0.0     66.67       100.0      0.00
  • Related