Home > Enterprise >  How to calculate the mean for each individual and subtract every observation from that mean value?
How to calculate the mean for each individual and subtract every observation from that mean value?

Time:09-30

I am looking for a simple and fast solution to 1) calculate the mean SG (course) for each subid and 2) subtract each subid's performance (index) from this value. So subid 2 had 1 and 10 in the SG course. Now I take those two values and calculate the mean. Next I every single performance for the subid in course A to this value. How can I do this?

sectionIndex = [1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10]
subid = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2]
course = ['A','A','A','A','A','A','A','A','SG','SG', 'A','A','A','A','A','A','A','A','SG','SG']
performance = [3,6,1,6,5,6,7,8,9,13,1,2,3,3,4,4,6,3,1,10]
ddict = {
    'subid': subid,
    'sectionIndex': sectionIndex,
    'course': course,
    'peformance': performance

}

df = pd.DataFrame(ddict)

CodePudding user response:

Use GroupBy.transform with replace not matched values to missing values in Series.where with mean, last subtract peformance column:

s = df['peformance'].where(df['course'].eq('SG')).groupby(df['subid']).transform('mean')
df['new'] = s.sub(df['peformance'])

print (df)
    subid  sectionIndex course  peformance   new
0       1             1      A           3   8.0
1       1             2      A           6   5.0
2       1             3      A           1  10.0
3       1             4      A           6   5.0
4       1             5      A           5   6.0
5       1             6      A           6   5.0
6       1             7      A           7   4.0
7       1             8      A           8   3.0
8       1             9     SG           9   2.0
9       1            10     SG          13  -2.0
10      2             1      A           1   4.5
11      2             2      A           2   3.5
12      2             3      A           3   2.5
13      2             4      A           3   2.5
14      2             5      A           4   1.5
15      2             6      A           4   1.5
16      2             7      A           6  -0.5
17      2             8      A           3   2.5
18      2             9     SG           1   4.5
19      2            10     SG          10  -4.5

Details:

print (df['peformance'].where(df['course'].eq('SG')))
0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
5      NaN
6      NaN
7      NaN
8      9.0
9     13.0
10     NaN
11     NaN
12     NaN
13     NaN
14     NaN
15     NaN
16     NaN
17     NaN
18     1.0
19    10.0
Name: peformance, dtype: float64
    
  • Related