I have a below table and want to calculate mean by state, I would like to calculate mean only for the rows which has 1 as customer
Customer | State | Score | Output_Mean |
---|---|---|---|
0 | GA | 1 | |
1 | GA | 2 | 2.5 |
1 | GA | 3 | 2.5 |
1 | NY | 9 | 8 |
1 | NY | 7 | 8 |
0 | DC | 6 | |
0 | DC | 4 |
I have below code, how do I add customer criteria?
df['output_mean'] = (df.fillna({'state':'missing'}).groupby(['state'])['score'].transform(lambda x: x.mean()))
CodePudding user response:
You can just update where you want:
customer_1 = df['Customer'].eq(1)
df.loc[customer_1, 'Output_Mean'] = df[customer_1].groupby('State')['Score'].transform('mean')
For most (later) Pandas versions, you can forgo the customer_1
on the left hand side and do:
df['Output_Mean'] = df[customer_1].groupby('State')['Score'].transform('mean')
Or just a query
without mask:
df['Output_Mean'] = df.query('Customer == 1').groupby('State')['Score'].transform('mean')
Output:
Customer State Score Output_Mean
0 0 GA 1 NaN
1 1 GA 2 2.5
2 1 GA 3 2.5
3 1 NY 9 8.0
4 1 NY 7 8.0
5 0 DC 6 NaN
6 0 DC 4 NaN
CodePudding user response:
You can include a mask for customer==1
in the groupby
(along with State
) and use np.where
to assign values to df['output_mean']
:
mask = df['Customer']==1
df['output_mean'] = np.where(mask, df.fillna({'State':'missing'}).groupby([mask,'State'])['Score'].transform('mean'), np.nan)
Output:
Customer State Score output_mean
0 0 GA 1 NaN
1 1 GA 2 2.5
2 1 GA 3 2.5
3 1 NY 9 8.0
4 1 NY 7 8.0
5 0 DC 6 NaN
6 0 DC 4 NaN