I am fairly new to Python, so excuse me if this question has been answered before or can be easily solved. I have a long data frame with numerical variables and categorical variables. It looks something like this:
Category Detail Gender Weight
Food Apple Female 30
Food Apple Male 40
Beverage Milk Female 10
Beverage Milk Male 5
Beverage Milk Male 20
Food Banana Female 50
What I want to do is this: Group by Category and Detail and then count all instances of 'Female' and 'Male'. I then want to weight these instances (see column 'Weight'). This should be done by taking the value from column 'Weight' and then deviding that by the summed weight. (so here for the group: Beverage, Milk, Male, it would be 25 devided by 35). It also would be nice to have the share of the gender. At the end of the day I want my data frame to look something like this:
Category Detail Female Male
Beverage Milk 29% 71%
Food Apple 43% 57%
Food Banana 100% 0%
So in addition to the grouping, I want to kind of 'unmelt' the data frame by taking Female and Male an adding them as new columns.
I could just sum the weights with groupby on different levels, but how can I reshape the data frame in that way of adding these new columns?
Is there any way to do that? Thanks for any help in advance!
CodePudding user response:
Use DataFrame.pivot_table
with divide summed values, last multiple by 100
and round
:
df = df.pivot_table(index=['Category','Detail'],
columns='Gender', values='Weight', aggfunc='sum', fill_value=0)
df = df.div(df.sum(axis=1), axis=0).mul(100).round().reset_index()
print (df)
Gender Category Detail Female Male
0 Beverage Milk 29.0 71.0
1 Food Apple 43.0 57.0
2 Food Banana 100.0 0.0
For percentages use:
df = df.pivot_table(index=['Category','Detail'],
columns='Gender', values='Weight', aggfunc='sum', fill_value=0)
df = df.div(df.sum(axis=1), axis=0).applymap("{:.2%}".format).reset_index()
print (df)
Gender Category Detail Female Male
0 Beverage Milk 28.57% 71.43%
1 Food Apple 42.86% 57.14%
2 Food Banana 100.00% 0.00%
CodePudding user response:
Like so
df2 = df.pivot_table(
index=['Category', 'Detail'],
columns='Gender',
values='Weight',
aggfunc='sum'
).fillna(0)
final = df2[['Female', 'Male']].div(df2.sum(axis=1), axis=0)
Gender Female Male
Category Detail
Beverage Milk 0.285714 0.714286
Food Apple 0.428571 0.571429
Banana 1.000000 0.000000