I have a dataframe df
:
2019 2020 2021 2022
A 1 10 15 15 31
2 5 4 7 9
3 0.3 0.4 0.4 0.7
4 500 600 70 90
B 1 10 15 15 31
2 5 4 7 9
3 0.3 0.4 0.4 0.7
4 500 600 70 90
C 1 10 15 15 31
2 5 4 7 9
3 0.3 0.4 0.4 0.7
4 500 600 70 90
D 1 10 15 15 31
2 5 4 7 9
3 0.3 0.4 0.4 0.7
4 500 600 70 90
I am trying to group by the level 1 index, 1, 2, 3, 4
and assign different aggregation functions for those 1, 2, 3, 4
indexes so that 1
is aggregated by sum
, 2
by mean
, and so on. So that the end result would look like this:
2019 2020 2021 2022
1 40 ... ... # sum
2 5 ... ... # mean
3 0.3 ... ... # mean
4 2000 ... ... # sum
I tried:
df.groupby(level = 1).agg({'1':'sum', '2':'mean', '3':'sum', '4':'mean'})
But I get that none of 1, 2, 3, 4
are in columns which they are not, so I am not sure how should I proceed with this problem.
CodePudding user response:
You could use apply
with a custom function as follows:
import numpy as np
aggs = {1: np.sum, 2: np.mean, 3: np.mean, 4: np.sum}
def f(x):
func = aggs.get(x.name, np.sum)
return func(x)
df.groupby(level=1).apply(f)
The above code uses sum by default so 1
and 4
could be removed from aggs
without any different results. In this way, only groups that should be handled differently from the rest need to be specified.
Result:
2019 2020 2021 2022
1 40.0 60.0 60.0 124.0
2 5.0 4.0 7.0 9.0
3 0.3 0.4 0.4 0.7
4 2000.0 2400.0 280.0 360.0