How can I apply multiple functions involving multiple columns of a pandas dataframe with grouby?-CodePudding

Considering the following datafrme:

	id	cat	date	max	score
1	s1	A	12/06	9	5.4
2	s1	B	12/06	10	5.4
3	s2	C	11/04	13	4.2
4	s2	D	11/04	28	10
5	s3	E	08/02	16	5.4
5	s3	F	08/02	6	5.4

I want to group by "id" and get the first of the columns [cat, date] but at the same time I want to get the mean of the max - score columns. I think it can be done with an aggregate:

but with aggregate I can't access all the columns of the dataframe.

id	cat	date	mean(max-score)
s1	A	12/06	4.1
s2	C	11/04	13.4
s3	E	08/02	5.6

CodePudding user response：

You first need to assign max-score to mean(max-score), then this is a simple groupby agg:

(df.assign(**{'mean(max-score)': df['max']-df['score']})
   .groupby('id', as_index=False)
   .agg({'cat': 'first', 'date': 'first', 'mean(max-score)': 'mean'})
)

output:

   id cat   date  mean(max-score)
0  s1   A  12/06              4.1
1  s2   C  11/04             13.4
2  s3   E  08/02              5.6

CodePudding user response：

From pandas 0.25 onwards, you can also use pd.namedAgg, a very useful function:

df['diff'] = df['max'] - df['score']
df.groupby('id').agg(cat=pd.NamedAgg(columns='cat',aggfunc='first'),
                         date=pd.NamedAgg(columns='date',aggfunc='first'),
                         mean(max-score)=pd.NamedAgg(columns='diff',aggfunc='mean'))