Home > Back-end >  Divide Column by Sum of Data Subset in Pandas
Divide Column by Sum of Data Subset in Pandas

Time:06-02

I have a pandas DataFrame like the following:

pd.DataFrame({'ID': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E'}, 'Count': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'Group': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B'}})

which looks like this:

Change  Count   Group
0   A   1   A
1   B   2   A
2   C   3   A
3   D   4   B
4   E   5   B

I would like to group by the Group column, find the sum of all the counts in the Count column for that subset and then create a new column Proportion which is calculated as the values in the Count column for that subset divided by the sum of that subset.

The result should look result in this:

pd.DataFrame({'Change': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E'}, 'Count': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'Group': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B'}, 'Proportion': {0: 0.167, 1: 0.333, 2: 0.5, 3: 0.444, 4: 0.555}})

and look like this:

Change  Count   Group   Proportion
0   A   1   A   0.167
1   B   2   A   0.333
2   C   3   A   0.500
3   D   4   B   0.444
4   E   5   B   0.555

CodePudding user response:

Let us try groupby with transform

df['new'] = df['Count'].div(df.groupby('Group')['Count'].transform('sum'))
df
Out[176]: 
  Change  Count Group       new
0      A      1     A  0.166667
1      B      2     A  0.333333
2      C      3     A  0.500000
3      D      4     B  0.444444
4      E      5     B  0.555556

CodePudding user response:

I would follow these steps:

  • Get group totals:
groups = df[['Group', 'Count']].groupby('Group').sum().reset_index()
  • Merge df with groups so each row has the group total count:
df = pd.merge(df, groups, on='Group')
  • Compute proportions:
df['Proportion'] = df['Count'] / df['sum']
  • Related