i have a pandas dataframe like this:
id gender column_1 column_2 column_3 column_n
10 male a b a b
10 female b c b c
10 male c c a a
10 male b a a b
I want to get this as output:
id column_name male_%_a male_%_b male_%_c female_%_a female_%_b female_%_c
10 column_1 33.3 33.3 33.3 0 100 0
10 column_2 33.3 33.3 33.3 0 0 100
10 column_3 100 0 0 0 100 0
10 column_n 33.3 66.7 0 0 0 100
The dataframe can have any number of column_1 to column_n and will always have a column id
and gender
I tried to groupby like this: df.gropby('gender') but i am having a hard time to get the calculations to get the % of each label a,b,c
CodePudding user response:
You can melt
, crosstab
and unstack
:
df2 = df.melt(['id', 'gender'], var_name='columns_name')
out = (
pd.crosstab([df2['id'], df2['gender'], df2['columns_name']], df2['value'])
.pipe(lambda d: d.div(d.sum(axis=1), axis=0).mul(100).round(2))
.unstack('gender')
)
out.columns = out.columns.map(lambda x: f'{x[1]}_%_{x[0]}')
out = out.reset_index()
Output:
id columns_name female_%_a male_%_a female_%_b male_%_b female_%_c male_%_c
0 10 column_1 0.0 33.33 100.0 33.33 0.0 33.33
1 10 column_2 0.0 33.33 0.0 33.33 100.0 33.33
2 10 column_3 0.0 100.00 100.0 0.00 0.0 0.00
3 10 column_n 0.0 33.33 0.0 66.67 100.0 0.00