In the dataframe below, I want to
- insert corresponding average
Score
of eachState
in a new column calledavg
import pandas as pd
import numpy as np
# initialise data of lists.
data = {'State':['CA', 'CA', 'CA', 'CA','CA', 'TX', 'TX', 'TX','TX', 'TX', 'FL', 'FL','FL', 'FL', 'FL', 'AZ','AZ','AZ', 'AZ', 'AZ'],
'Score':[3,3,3,2,1,4,2,2,1,2,1,1,1,1,2,2,5,5,5,5],
'Income':[32112,34214,45575,22106,32612,34216,47515,22906,32112,34511,45525,12106,52112,54214,45015,22986,32112,34214,47518,22175],
}
df = pd.DataFrame(data)
My attempt below:
mean_score = df.groupby(['State'])['Score'].agg(pd.Series.mean)
mean_score[:4]
# I need assistance here, instead of subsetting one by one I want efficiently append the mean_score to each corresponding state
az = df.loc[df['State'] == 'AZ']
az['avg'] = mean_score[0]
ca = df.loc[df['State'] == 'CA']
ca['avg'] = mean_score[1]
fl = df.loc[df['State'] == 'FL']
fl['avg'] = mean_score[2]
tx = df.loc[df['State'] == 'TX']
tx['avg'] = mean_score[3]
# concat all 4 dataframes
new_df = pd.concat([az,ca,fl,tx],axis = 0).reset_index(drop = True)
# export the output as a csv file
new_df.to_csv("analysis_output.csv",index = False)
CodePudding user response:
No need for a complex pipeline, directly use GroupBy.transform
:
df['avg'] = df.groupby('State')['Score'].transform('mean')
Output:
State Score Income avg
0 CA 3 32112 2.4
1 CA 3 34214 2.4
2 CA 3 45575 2.4
3 CA 2 22106 2.4
4 CA 1 32612 2.4
5 TX 4 34216 2.2
6 TX 2 47515 2.2
7 TX 2 22906 2.2
8 TX 1 32112 2.2
9 TX 2 34511 2.2
10 FL 1 45525 1.2
11 FL 1 12106 1.2
12 FL 1 52112 1.2
13 FL 1 54214 1.2
14 FL 2 45015 1.2
15 AZ 2 22986 4.4
16 AZ 5 32112 4.4
17 AZ 5 34214 4.4
18 AZ 5 47518 4.4
19 AZ 5 22175 4.4
CodePudding user response:
You can calculate the mean in one dataframe and then join using the same columns as in the groupby
group_keys = ['State']
mean_score = df.groupby(group_keys)['Score'].agg(pd.Series.mean)
pd.merge(df, mean_score, on=group_keys)