Home > Mobile >  Calculating percentage share for each Pandas group and converting to new columns
Calculating percentage share for each Pandas group and converting to new columns

Time:10-04

I have this dataframe:

import pandas as pd


df = pd.DataFrame({"ID": ["123", "456", "123", "456", "123", "456", "456", "456", "456", "123"],
                    "number": [2,1,5,2,4,5,2,1,6,7],
                    "name": ["apple", "orange", "orange", "pear", "pear", "apple", "lemon", "apple", "lemon", "apple"]
})

    ID  number    name
0  123       2   apple
1  456       1  orange
2  123       5  orange
3  456       2    pear
4  123       4    pear
5  456       5   apple
6  456       2   lemon
7  456       1   apple
8  456       6   lemon
9  123       7   apple

I want to calculate percentage share for each element i name column, grouped by ID.

Final look of the DataFrame should have these columns:

ID, number, apple, orange, pear, lemon

So, values from name column are becoming column names. Values that those columns should have are percentage share, for example:

for ID 123 I have these number values: 2 5 4 7 = 18

So

apple has: (2 7) / 18 = 0.50
orange has: 5 / 18 = 0.27
pear has: 4 / 18 = 0.23
lemon has: 0   

ID   apple   orange   pear   lemon
123  0.5     0.27     0.23    0

CodePudding user response:

You can use a crosstab with normalize='index':

out = pd.crosstab(df['ID'], df['name'], values=df['number'],
                  aggfunc='sum', normalize='index')

output:

name     apple     lemon    orange      pear
ID                                          
123   0.500000  0.000000  0.277778  0.222222
456   0.352941  0.470588  0.058824  0.117647

CodePudding user response:

Use GroupBy.transform with Series.div by different groups:

df['Perc'] = (df.groupby(['ID','name'])['number'].transform('sum')
                .div(df.groupby('ID')['number'].transform('sum')))
print (df)
    ID  number    name      Perc
0  123       2   apple  0.500000
1  456       1  orange  0.058824
2  123       5  orange  0.277778
3  456       2    pear  0.117647
4  123       4    pear  0.222222
5  456       5   apple  0.352941
6  456       2   lemon  0.470588
7  456       1   apple  0.352941
8  456       6   lemon  0.470588
9  123       7   apple  0.500000

If need another ouput use DataFrame.pivot_table with division by sum:

df = df.pivot_table(index='ID', 
                    columns='name', 
                    values='number', 
                    aggfunc='sum', 
                    fill_value=0)
df = df.div(df.sum(axis=1), axis=0)
print (df)
name     apple     lemon    orange      pear
ID                                          
123   0.500000  0.000000  0.277778  0.222222
456   0.352941  0.470588  0.058824  0.117647
  • Related