Pandas group by x, sort by y, select z, aggregating in case of multiple maximum values, multiple col-CodePudding

This is the second part of this question.

Suppose I have a dataframe df and I want to select x1 and x100, corresponding to the largest amount, grouped by group_id. If there are multiple rows with the largest amount, then I want to select medians of x1 and x100.

df = pd.DataFrame({'group_id' : [1,1,1,2,2,3,3,3,3],
                   'amount'  : [2,4,5,1,2,3,5,5,5],
                   'x1':[2,5,8,3,6,9,3,1,0],
                  'x100':[1,2,3,4,8,9,9,4,5]})

   group_id  amount  x1  x100
0         1       2   2     1
1         1       4   5     2
2         1       5   8     3
3         2       1   3     4
4         2       2   6     8
5         3       3   9     9
6         3       5   3     9
7         3       5   1     4
8         3       5   0     5

So the desired output looks like this:

          median_x1  median_x100
group_id                        
1               8.0          3.0
2               6.0          8.0
3               1.0          5.0

For only 2 columns (x1 and x100), I can simply add 1 line to @AndrejKesely solution to the previous question, something like this:

out = df.groupby("group_id").apply(
    lambda x: pd.Series(
        {"median_x1": (d := x.loc[x["amount"] == x["amount"].max()])['x1'].median(),
         "median_x100": d["x100"].median()}
    )
)

How to do this in a neat way, which will work for 100 columns, i.e., x1, x2 up to x100? Ideally, I do not want to copypaste one line 100 times and manually changing name of a column in an editor...

CodePudding user response：

Maybe something like this?

df.groupby('group_id').apply(
    lambda x: x[x['amount'] == x['amount'].max()
               ].drop(columns=['amount', 'group_id']).median())

You can also use names of columns instead of .drop():

df.groupby('group_id').apply(
    lambda x: x.loc[x['amount'] == x['amount'].max(), ['x1', 'x100']].median())