Home > Back-end >  How can you group a data frame and reshape from long to wide?
How can you group a data frame and reshape from long to wide?

Time:11-23

I am fairly new to Python, so excuse me if this question has been answered before or can be easily solved. I have a long data frame with numerical variables and categorical variables. It looks something like this:

 Category Detail Gender Weight

 Food     Apple  Female 30   
 Food     Apple  Male   40 
 Beverage Milk   Female 10
 Beverage Milk   Male   5
 Beverage Milk   Male   20 
 Food     Banana Female 50

What I want to do is this: Group by Category and Detail and then count all instances of 'Female' and 'Male'. I then want to weight these instances (see column 'Weight'). This should be done by taking the value from column 'Weight' and then deviding that by the summed weight. (so here for the group: Beverage, Milk, Male, it would be 25 devided by 35). It also would be nice to have the share of the gender. At the end of the day I want my data frame to look something like this:

Category Detail Female Male 
Beverage Milk   29%    71%
Food     Apple  43%    57%
Food     Banana 100%   0%

So in addition to the grouping, I want to kind of 'unmelt' the data frame by taking Female and Male an adding them as new columns.

I could just sum the weights with groupby on different levels, but how can I reshape the data frame in that way of adding these new columns?

Is there any way to do that? Thanks for any help in advance!

CodePudding user response:

Use DataFrame.pivot_table with divide summed values, last multiple by 100 and round:

df = df.pivot_table(index=['Category','Detail'],
                    columns='Gender', values='Weight', aggfunc='sum', fill_value=0)
df = df.div(df.sum(axis=1), axis=0).mul(100).round().reset_index()
print (df)
Gender  Category  Detail  Female  Male
0       Beverage    Milk    29.0  71.0
1           Food   Apple    43.0  57.0
2           Food  Banana   100.0   0.0

For percentages use:

df = df.pivot_table(index=['Category','Detail'],
                    columns='Gender', values='Weight', aggfunc='sum', fill_value=0)
df = df.div(df.sum(axis=1), axis=0).applymap("{:.2%}".format).reset_index()
print (df)
Gender  Category  Detail   Female    Male
0       Beverage    Milk   28.57%  71.43%
1           Food   Apple   42.86%  57.14%
2           Food  Banana  100.00%   0.00%

CodePudding user response:

Like so

df2 = df.pivot_table(
    index=['Category', 'Detail'], 
    columns='Gender', 
    values='Weight', 
    aggfunc='sum'
).fillna(0)
final = df2[['Female', 'Male']].div(df2.sum(axis=1), axis=0)
Gender             Female      Male
Category Detail                    
Beverage Milk    0.285714  0.714286
Food     Apple   0.428571  0.571429
         Banana  1.000000  0.000000
  • Related