I have a pandas DataFrame like the following:
pd.DataFrame({'ID': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E'}, 'Count': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'Group': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B'}})
which looks like this:
Change Count Group
0 A 1 A
1 B 2 A
2 C 3 A
3 D 4 B
4 E 5 B
I would like to group by the Group
column, find the sum of all the counts in the Count
column for that subset and then create a new column Proportion
which is calculated as the values in the Count
column for that subset divided by the sum of that subset.
The result should look result in this:
pd.DataFrame({'Change': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E'}, 'Count': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'Group': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B'}, 'Proportion': {0: 0.167, 1: 0.333, 2: 0.5, 3: 0.444, 4: 0.555}})
and look like this:
Change Count Group Proportion
0 A 1 A 0.167
1 B 2 A 0.333
2 C 3 A 0.500
3 D 4 B 0.444
4 E 5 B 0.555
CodePudding user response:
Let us try groupby
with transform
df['new'] = df['Count'].div(df.groupby('Group')['Count'].transform('sum'))
df
Out[176]:
Change Count Group new
0 A 1 A 0.166667
1 B 2 A 0.333333
2 C 3 A 0.500000
3 D 4 B 0.444444
4 E 5 B 0.555556
CodePudding user response:
I would follow these steps:
- Get group totals:
groups = df[['Group', 'Count']].groupby('Group').sum().reset_index()
- Merge df with groups so each row has the group total count:
df = pd.merge(df, groups, on='Group')
- Compute proportions:
df['Proportion'] = df['Count'] / df['sum']