How pivot a dataframe to create 4 columns from existing 1 and inside each new cells calculate percen-CodePudding

I have this dataframe:

Country AgeRepartition     Count
USA     above 20           10
USA     less than 20       50
USA     above 50           40
Canada  above 20           50
Canada  less than 20       10
Canada  above 50           30

I would like to pivot this dataframe to have one column by age repartition type and have percentage as values.

Expected output:

Country above 20 less than 20 above 50  
USA     10%      50%          40%
Canada  55%      11%          33%

The percentage for example is how many people for USA are above 20 among all USA count (10/(10 50 40))

How can I do this ?

CodePudding user response：

You can use pivot pipe, in the pipe, divide by the sum per row and multiply by 100:

df2 = (df
 .pivot(index='Country', columns='AgeRepartition', values='Count')
 .pipe(lambda d: d.div(d.sum(axis=1), axis=0).mul(100))
)

output:

AgeRepartition  above 20  above 50  less than 20
Country                                         
Canada             55.56     33.33         11.11
USA                10.00     40.00         50.00

CodePudding user response：

Another ways is to use groupby transform(sum) rdiv to find the percentages, use assign to assign it back; then pivot:

out = (df.assign(Count=df.groupby('Country')['Count'].transform('sum').rdiv(df['Count']).mul(100)
                 .astype(int).astype(str).add('%'))
       .pivot(*df).reset_index().rename_axis(columns=[None]))

Output:

  Country above 20 above 50 less than 20
0  Canada      55%      33%          11%
1  USA         10%      40%          50%

CodePudding user response：

Use DataFrame.pivot with divide values by sum, for correct order of column is used DataFrame.reindex:

df = (df.pivot('Country','AgeRepartition','Count')
        .reindex(columns=df['AgeRepartition'].unique(), index=df['Country'].unique()))
df = df.div(df.sum(axis=1), axis=0).mul(100)
print (df)
AgeRepartition   above 20  less than 20   above 50
Country                                           
USA             10.000000     50.000000  40.000000
Canada          55.555556     11.111111  33.333333

Another solution for same order in new index values and in new columns with ordered categoricals:

df['Country'] = pd.Categorical(df['Country'], 
                               ordered=True, 
                               categories=df['Country'].unique())
df['AgeRepartition'] = pd.Categorical(df['AgeRepartition'], 
                                      ordered=True, 
                                      categories=df['AgeRepartition'].unique())
df = df.pivot('Country','AgeRepartition','Count')
df = df.div(df.sum(axis=1), axis=0).mul(100)
print (df)
AgeRepartition   above 20  less than 20   above 50
Country                                           
USA             10.000000     50.000000  40.000000
Canada          55.555556     11.111111  33.333333

CodePudding user response：

The easiest ways is the .pivot_table from pandas library

import pandas as pd

df = pd.pivot_table(index=['Country'], columns='AgeRepartition', values='Count' aggfunc='first')