Consider the following example dataframe:
df = pd.DataFrame({
"key1":['a','a','a','a','b','b','b','b'],
"key2":[1,1,1,2,3,3,4,4],
"key3":['R','S','T','U','V','W','X','Y'],
"count":[10,20,10,5,12,13,8,22]
})
key1 key2 key3 count
0 a 1 R 10
1 a 1 S 20
2 a 1 T 10
3 a 2 U 5
4 b 3 V 12
5 b 3 W 13
6 b 4 X 8
7 b 4 Y 22
I want the sums of the "count" column on the different key levels. One of several ways to represent the wanted result is like so:
key1 count_key1 key2 count_key2 key3 count
a 45 1 40 R 10
a 45 1 40 S 20
a 45 1 5 T 10
a 45 2 5 U 5
b 55 3 25 V 12
b 55 3 25 W 13
b 55 4 30 X 8
b 55 4 30 Y 22
groupby()sum() seems to be a good starting point, but nothing more:
df.set_index(["key1","key2","key3"]).groupby(level=["key1","key2","key3"]).sum()
count
key1 key2 key3
a 1 R 10
S 20
T 10
2 U 5
b 3 V 12
W 13
4 X 8
Y 22
df.set_index(["key1","key2","key3"]).groupby(level=["key1","key2"]).sum()
count
key1 key2
a 1 40
2 5
b 3 25
4 30
df.set_index(["key1","key2","key3"]).groupby(level=["key1"]).sum()
count
key1
a 45
b 55
CodePudding user response:
You would need to use a loop and groupby.transform
:
for key in ['key1', 'key2']:
df[f'count_{key}'] = df.groupby(key)['count'].transform('sum')
Output:
key1 key2 key3 count count_key1 count_key2
0 a 1 R 10 45 40
1 a 1 S 20 45 40
2 a 1 T 10 45 40
3 a 2 U 5 45 5
4 b 3 V 12 55 25
5 b 3 W 13 55 25
6 b 4 X 8 55 30
7 b 4 Y 22 55 30
If position matters:
for key in ['key1', 'key2']:
if key in df.columns:
df.insert(df.columns.get_loc(key) 1,
f'count_{key}',
df.groupby(key)['count'].transform('sum')
)
Output:
key1 count_key1 key2 count_key2 key3 count
0 a 45 1 40 R 10
1 a 45 1 40 S 20
2 a 45 1 40 T 10
3 a 45 2 5 U 5
4 b 55 3 25 V 12
5 b 55 3 25 W 13
6 b 55 4 30 X 8
7 b 55 4 30 Y 22