Home > Enterprise >  Pandas combine .groubpy().sum() on different index levels in one result-dataframe
Pandas combine .groubpy().sum() on different index levels in one result-dataframe

Time:11-19

Consider the following example dataframe:

df = pd.DataFrame({
   "key1":['a','a','a','a','b','b','b','b'],
   "key2":[1,1,1,2,3,3,4,4],
   "key3":['R','S','T','U','V','W','X','Y'],
   "count":[10,20,10,5,12,13,8,22]
})

  key1  key2 key3  count
0    a     1    R     10
1    a     1    S     20
2    a     1    T     10
3    a     2    U      5
4    b     3    V     12
5    b     3    W     13
6    b     4    X      8
7    b     4    Y     22

I want the sums of the "count" column on the different key levels. One of several ways to represent the wanted result is like so:

key1  count_key1 key2 count_key2 key3  count
   a          45    1         40    R     10
   a          45    1         40    S     20
   a          45    1          5    T     10
   a          45    2          5    U      5
   b          55    3         25    V     12
   b          55    3         25    W     13
   b          55    4         30    X      8
   b          55    4         30    Y     22

groupby()sum() seems to be a good starting point, but nothing more:

df.set_index(["key1","key2","key3"]).groupby(level=["key1","key2","key3"]).sum()
                count
key1 key2 key3      
a    1    R        10
          S        20
          T        10
     2    U         5
b    3    V        12
          W        13
     4    X         8
          Y        22

df.set_index(["key1","key2","key3"]).groupby(level=["key1","key2"]).sum()
           count
key1 key2      
a    1        40
     2         5
b    3        25
     4        30
             
df.set_index(["key1","key2","key3"]).groupby(level=["key1"]).sum()
      count
key1      
a        45
b        55

CodePudding user response:

You would need to use a loop and groupby.transform:

for key in ['key1', 'key2']:
    df[f'count_{key}'] = df.groupby(key)['count'].transform('sum')

Output:

  key1  key2 key3  count  count_key1  count_key2
0    a     1    R     10          45          40
1    a     1    S     20          45          40
2    a     1    T     10          45          40
3    a     2    U      5          45           5
4    b     3    V     12          55          25
5    b     3    W     13          55          25
6    b     4    X      8          55          30
7    b     4    Y     22          55          30

If position matters:

for key in ['key1', 'key2']:
    if key in df.columns:
        df.insert(df.columns.get_loc(key) 1,
                  f'count_{key}',
                  df.groupby(key)['count'].transform('sum')
                 )

Output:

  key1  count_key1  key2  count_key2 key3  count
0    a          45     1          40    R     10
1    a          45     1          40    S     20
2    a          45     1          40    T     10
3    a          45     2           5    U      5
4    b          55     3          25    V     12
5    b          55     3          25    W     13
6    b          55     4          30    X      8
7    b          55     4          30    Y     22
  • Related