Mean by group, exclude some rows-CodePudding

I have a below table and want to calculate mean by state, I would like to calculate mean only for the rows which has 1 as customer

Customer	State	Score	Output_Mean
0	GA	1
1	GA	2	2.5
1	GA	3	2.5
1	NY	9	8
1	NY	7	8
0	DC	6
0	DC	4

I have below code, how do I add customer criteria?

df['output_mean'] = (df.fillna({'state':'missing'}).groupby(['state'])['score'].transform(lambda x: x.mean()))

CodePudding user response：

You can just update where you want:

customer_1 = df['Customer'].eq(1)
df.loc[customer_1, 'Output_Mean'] = df[customer_1].groupby('State')['Score'].transform('mean')

For most (later) Pandas versions, you can forgo the customer_1 on the left hand side and do:

df['Output_Mean'] = df[customer_1].groupby('State')['Score'].transform('mean')

Or just a query without mask:

df['Output_Mean'] = df.query('Customer == 1').groupby('State')['Score'].transform('mean')

Output:

   Customer State  Score  Output_Mean
0         0   GA       1          NaN
1         1   GA       2          2.5
2         1   GA       3          2.5
3         1   NY       9          8.0
4         1   NY       7          8.0
5         0   DC       6          NaN
6         0   DC       4          NaN

CodePudding user response：

You can include a mask for customer==1 in the groupby (along with State) and use np.where to assign values to df['output_mean']:

mask = df['Customer']==1
df['output_mean'] = np.where(mask, df.fillna({'State':'missing'}).groupby([mask,'State'])['Score'].transform('mean'), np.nan)

Output:

   Customer State  Score  output_mean
0         0    GA      1          NaN
1         1    GA      2          2.5
2         1    GA      3          2.5
3         1    NY      9          8.0
4         1    NY      7          8.0
5         0    DC      6          NaN
6         0    DC      4          NaN