I have this dataframe:
Country AgeRepartition Count
USA above 20 10
USA less than 20 50
USA above 50 40
Canada above 20 50
Canada less than 20 10
Canada above 50 30
I would like to pivot this dataframe to have one column by age repartition type and have percentage as values.
Expected output:
Country above 20 less than 20 above 50
USA 10% 50% 40%
Canada 55% 11% 33%
The percentage for example is how many people for USA are above 20 among all USA count (10/(10 50 40))
How can I do this ?
CodePudding user response:
You can use pivot
pipe
, in the pipe, divide by the sum per row and multiply by 100:
df2 = (df
.pivot(index='Country', columns='AgeRepartition', values='Count')
.pipe(lambda d: d.div(d.sum(axis=1), axis=0).mul(100))
)
output:
AgeRepartition above 20 above 50 less than 20
Country
Canada 55.56 33.33 11.11
USA 10.00 40.00 50.00
CodePudding user response:
Another ways is to use groupby
transform(sum)
rdiv
to find the percentages, use assign
to assign it back; then pivot
:
out = (df.assign(Count=df.groupby('Country')['Count'].transform('sum').rdiv(df['Count']).mul(100)
.astype(int).astype(str).add('%'))
.pivot(*df).reset_index().rename_axis(columns=[None]))
Output:
Country above 20 above 50 less than 20
0 Canada 55% 33% 11%
1 USA 10% 40% 50%
CodePudding user response:
Use DataFrame.pivot
with divide values by sum
, for correct order of column is used DataFrame.reindex
:
df = (df.pivot('Country','AgeRepartition','Count')
.reindex(columns=df['AgeRepartition'].unique(), index=df['Country'].unique()))
df = df.div(df.sum(axis=1), axis=0).mul(100)
print (df)
AgeRepartition above 20 less than 20 above 50
Country
USA 10.000000 50.000000 40.000000
Canada 55.555556 11.111111 33.333333
Another solution for same order in new index values and in new columns with ordered categoricals:
df['Country'] = pd.Categorical(df['Country'],
ordered=True,
categories=df['Country'].unique())
df['AgeRepartition'] = pd.Categorical(df['AgeRepartition'],
ordered=True,
categories=df['AgeRepartition'].unique())
df = df.pivot('Country','AgeRepartition','Count')
df = df.div(df.sum(axis=1), axis=0).mul(100)
print (df)
AgeRepartition above 20 less than 20 above 50
Country
USA 10.000000 50.000000 40.000000
Canada 55.555556 11.111111 33.333333
CodePudding user response:
The easiest ways is the .pivot_table from pandas library
import pandas as pd
df = pd.pivot_table(index=['Country'], columns='AgeRepartition', values='Count' aggfunc='first')