Home > OS >  Pandas insert group mean values in a new column
Pandas insert group mean values in a new column

Time:06-30

In the dataframe below, I want to

  1. insert corresponding average Score of each State in a new column called avg
import pandas as pd
import numpy as np
 
# initialise data of lists.
data = {'State':['CA', 'CA', 'CA', 'CA','CA', 'TX', 'TX', 'TX','TX', 'TX', 'FL', 'FL','FL', 'FL', 'FL', 'AZ','AZ','AZ', 'AZ', 'AZ'],
        
         'Score':[3,3,3,2,1,4,2,2,1,2,1,1,1,1,2,2,5,5,5,5],
        'Income':[32112,34214,45575,22106,32612,34216,47515,22906,32112,34511,45525,12106,52112,54214,45015,22986,32112,34214,47518,22175],
        }

df = pd.DataFrame(data)

My attempt below:


mean_score = df.groupby(['State'])['Score'].agg(pd.Series.mean)
mean_score[:4]

# I need assistance here, instead of subsetting one by one I want efficiently append the mean_score to each corresponding state

az = df.loc[df['State'] == 'AZ']
az['avg'] = mean_score[0]
    
ca = df.loc[df['State'] == 'CA']
ca['avg'] = mean_score[1]

fl = df.loc[df['State'] == 'FL']
fl['avg'] = mean_score[2]
    
tx = df.loc[df['State'] == 'TX']
tx['avg'] = mean_score[3]

# concat all 4 dataframes
new_df = pd.concat([az,ca,fl,tx],axis = 0).reset_index(drop = True)

# export the output as a csv file 

new_df.to_csv("analysis_output.csv",index = False)

CodePudding user response:

No need for a complex pipeline, directly use GroupBy.transform:

df['avg'] = df.groupby('State')['Score'].transform('mean')

Output:

   State  Score  Income  avg
0     CA      3   32112  2.4
1     CA      3   34214  2.4
2     CA      3   45575  2.4
3     CA      2   22106  2.4
4     CA      1   32612  2.4
5     TX      4   34216  2.2
6     TX      2   47515  2.2
7     TX      2   22906  2.2
8     TX      1   32112  2.2
9     TX      2   34511  2.2
10    FL      1   45525  1.2
11    FL      1   12106  1.2
12    FL      1   52112  1.2
13    FL      1   54214  1.2
14    FL      2   45015  1.2
15    AZ      2   22986  4.4
16    AZ      5   32112  4.4
17    AZ      5   34214  4.4
18    AZ      5   47518  4.4
19    AZ      5   22175  4.4

CodePudding user response:

You can calculate the mean in one dataframe and then join using the same columns as in the groupby

group_keys = ['State']
mean_score = df.groupby(group_keys)['Score'].agg(pd.Series.mean)
pd.merge(df, mean_score, on=group_keys)
  • Related