I am looking for a simple and fast solution to 1) calculate the mean SG (course) for each subid and 2) subtract each subid's performance (index) from this value. So subid 2 had 1 and 10 in the SG course. Now I take those two values and calculate the mean. Next I every single performance for the subid in course A to this value. How can I do this?
sectionIndex = [1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10]
subid = [1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2]
course = ['A','A','A','A','A','A','A','A','SG','SG', 'A','A','A','A','A','A','A','A','SG','SG']
performance = [3,6,1,6,5,6,7,8,9,13,1,2,3,3,4,4,6,3,1,10]
ddict = {
'subid': subid,
'sectionIndex': sectionIndex,
'course': course,
'peformance': performance
}
df = pd.DataFrame(ddict)
CodePudding user response:
Use GroupBy.transform
with replace not matched values to missing values in Series.where
with mean
, last subtract peformance
column:
s = df['peformance'].where(df['course'].eq('SG')).groupby(df['subid']).transform('mean')
df['new'] = s.sub(df['peformance'])
print (df)
subid sectionIndex course peformance new
0 1 1 A 3 8.0
1 1 2 A 6 5.0
2 1 3 A 1 10.0
3 1 4 A 6 5.0
4 1 5 A 5 6.0
5 1 6 A 6 5.0
6 1 7 A 7 4.0
7 1 8 A 8 3.0
8 1 9 SG 9 2.0
9 1 10 SG 13 -2.0
10 2 1 A 1 4.5
11 2 2 A 2 3.5
12 2 3 A 3 2.5
13 2 4 A 3 2.5
14 2 5 A 4 1.5
15 2 6 A 4 1.5
16 2 7 A 6 -0.5
17 2 8 A 3 2.5
18 2 9 SG 1 4.5
19 2 10 SG 10 -4.5
Details:
print (df['peformance'].where(df['course'].eq('SG')))
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 9.0
9 13.0
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
18 1.0
19 10.0
Name: peformance, dtype: float64