How to calculate the mean for each individual and subtract every observation from that mean value?-CodePudding

I am looking for a simple and fast solution to 1) calculate the mean SG (course) for each subid and 2) subtract each subid's performance (index) from this value. So subid 2 had 1 and 10 in the SG course. Now I take those two values and calculate the mean. Next I every single performance for the subid in course A to this value. How can I do this?

sectionIndex = [1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10]
subid = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2]
course = ['A','A','A','A','A','A','A','A','SG','SG', 'A','A','A','A','A','A','A','A','SG','SG']
performance = [3,6,1,6,5,6,7,8,9,13,1,2,3,3,4,4,6,3,1,10]
ddict = {
    'subid': subid,
    'sectionIndex': sectionIndex,
    'course': course,
    'peformance': performance

}

df = pd.DataFrame(ddict)

CodePudding user response：

Use GroupBy.transform with replace not matched values to missing values in Series.where with mean, last subtract peformance column:

s = df['peformance'].where(df['course'].eq('SG')).groupby(df['subid']).transform('mean')
df['new'] = s.sub(df['peformance'])

print (df)
    subid  sectionIndex course  peformance   new
0       1             1      A           3   8.0
1       1             2      A           6   5.0
2       1             3      A           1  10.0
3       1             4      A           6   5.0
4       1             5      A           5   6.0
5       1             6      A           6   5.0
6       1             7      A           7   4.0
7       1             8      A           8   3.0
8       1             9     SG           9   2.0
9       1            10     SG          13  -2.0
10      2             1      A           1   4.5
11      2             2      A           2   3.5
12      2             3      A           3   2.5
13      2             4      A           3   2.5
14      2             5      A           4   1.5
15      2             6      A           4   1.5
16      2             7      A           6  -0.5
17      2             8      A           3   2.5
18      2             9     SG           1   4.5
19      2            10     SG          10  -4.5

Details:

print (df['peformance'].where(df['course'].eq('SG')))
0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
5      NaN
6      NaN
7      NaN
8      9.0
9     13.0
10     NaN
11     NaN
12     NaN
13     NaN
14     NaN
15     NaN
16     NaN
17     NaN
18     1.0
19    10.0
Name: peformance, dtype: float64