Grouping & aggregating on level 1 index & assigning different aggregation functions using pandas-CodePudding

I have a dataframe df:

                2019            2020            2021        2022
A       1       10              15              15          31
        2       5               4               7           9
        3       0.3             0.4             0.4         0.7
        4       500             600             70          90
B       1       10              15              15          31
        2       5               4               7           9
        3       0.3             0.4             0.4         0.7
        4       500             600             70          90
C       1       10              15              15          31
        2       5               4               7           9
        3       0.3             0.4             0.4         0.7
        4       500             600             70          90
D       1       10              15              15          31
        2       5               4               7           9
        3       0.3             0.4             0.4         0.7
        4       500             600             70          90

I am trying to group by the level 1 index, 1, 2, 3, 4 and assign different aggregation functions for those 1, 2, 3, 4 indexes so that 1 is aggregated by sum, 2 by mean, and so on. So that the end result would look like this:

            2019            2020            2021        2022
1           40              ...             ...         # sum
2           5               ...             ...         # mean
3           0.3             ...             ...         # mean
4           2000            ...             ...         # sum

I tried:

df.groupby(level = 1).agg({'1':'sum', '2':'mean', '3':'sum', '4':'mean'})

But I get that none of 1, 2, 3, 4 are in columns which they are not, so I am not sure how should I proceed with this problem.

CodePudding user response：

You could use apply with a custom function as follows:

import numpy as np

aggs = {1: np.sum, 2: np.mean, 3: np.mean, 4: np.sum}
def f(x):
    func = aggs.get(x.name, np.sum)
    return func(x)
     
df.groupby(level=1).apply(f)

The above code uses sum by default so 1 and 4 could be removed from aggs without any different results. In this way, only groups that should be handled differently from the rest need to be specified.

Result:

      2019    2020   2021    2022               
1     40.0    60.0   60.0   124.0
2      5.0     4.0    7.0     9.0
3      0.3     0.4    0.4     0.7
4   2000.0  2400.0  280.0   360.0