I am working on this table:
plaza | station | name | ID | g1 | g2 | g3 | location | g_avg |
---|---|---|---|---|---|---|---|---|
45 | 1 | aaa | 45_1_aaa | 4 | 5 | 6 | sun | 5.00 |
45 | 2 | aaa | 45_2_aaa | 1 | 5 | 2 | sun | 2.67 |
46 | 1 | bbb | 46_1_bbb | 8 | 9 | 3 | sun | 6.67 |
47 | 1 | ccc | 47_1_ccc | 4 | 5 | 0 | sun | 3.00 |
47 | 2 | ccc | 47_2_ccc | 3 | 4 | 7 | sun | 4.67 |
45 | 3 | aaa | 45_3_aaa | 3 | 4 | 0 | sun | 2.33 |
34 | 1 | ddd | 34_1_ddd | 19 | 29 | 28 | moon | 25.33 |
47 | 3 | eee | 47_3_eee | 1 | 2 | 2 | moon | 1.67 |
48 | 1 | fff | 48_1_fff | 1 | 2 | 1 | moon | 1.33 |
49 | 1 | ggg | 49_1_ggg | 2 | 1 | 1 | moon | 1.33 |
49 | 2 | ggg | 49_2_ggg | 1 | 1 | 1 | moon | 1.00 |
I am running a groupby on the location and ID columns and I want to get, for each row, the g_avg divided by the sum of g_average for rows in that location.
The code im using is this
data.groupby(['location', 'ID']).apply(lambda x: len(x['g_avg']) /x['g_avg'].sum() * 100)
and this is the result I get:
What I was hoping to achieve is this:
Where am I going wrong?
CodePudding user response:
You can use groupby.transform
to find the total g_avg
for each location; then divide g_avg
values by these sums:
df['ratio'] = df['g_avg'] / df.groupby('location')['g_avg'].transform('sum') * 100
out = df[['location','ID','ratio']]
Output:
location ID ratio
0 sun 45_1_aaa 20.542317
1 sun 45_2_aaa 10.969597
2 sun 46_1_bbb 27.403451
3 sun 47_1_ccc 12.325390
4 sun 47_2_ccc 19.186524
5 sun 45_3_aaa 9.572720
6 moon 34_1_ddd 82.615786
7 moon 47_3_eee 5.446836
8 moon 48_1_fff 4.337900
9 moon 49_1_ggg 4.337900
10 moon 49_2_ggg 3.261579