I am fairly new to python and I have the following dataframe
setting_id subject_id seconds result_id owner_id average duration_id
0 7 1 0 1680.5 2.0 24.000 1.0
1 7 1 3600 1690.5 2.0 46.000 2.0
2 7 1 10800 1700.5 2.0 101.000 4.0
3 7 2 0 1682.5 2.0 12.500 1.0
4 7 2 3600 1692.5 2.0 33.500 2.0
5 7 2 10800 1702.5 2.0 86.500 4.0
6 7 3 0 1684.5 2.0 8.500 1.0
7 7 3 3600 1694.5 2.0 15.000 2.0
8 7 3 10800 1704.5 2.0 34.000 4.0
What I need to do is Calculate the deviation (%) from averages with a "seconds"-value not equal to 0 from those averages with a seconds value of zero, where the subject_id and Setting_id are the same
i.e. setting_id ==7 & subject_id ==1
would be:
(result/baseline)*100
------> for 3600 seconds: (46/24)*100 = 192%
------> for 10800 seconds: (101/24)*100 = 421%
.... baseline = average-result with a seconds value of 0
.... result = average-result with a seconds value other than 0
The resulting df should look like this
setting_id subject_id seconds owner_id average deviation duration_id
0 7 1 0 2 24 0 1
1 7 1 3600 2 46 192 2
2 7 1 10800 2 101 421 4
I want to use these calculations then to plot a regression graph (with seaborn) of deviations from baseline
I have played around with this df for 2 days now and tried different forloops but I just can´t figure out the correct way.
CodePudding user response:
You can use:
# identify rows with 0
m = df['seconds'].eq(0)
# compute the sum of rows with 0
s = (df['average'].where(m)
.groupby([df['setting_id'], df['subject_id']])
.sum()
)
# compute the deviation per group
deviation = (
df[['setting_id', 'subject_id']]
.merge(s, left_on=['setting_id', 'subject_id'], right_index=True, how='left')
['average']
.rdiv(df['average']).mul(100)
.round().astype(int) # optional
.mask(m, 0)
)
df['deviation'] = deviation
# or
# out = df.assign(deviation=deviation)
Output:
setting_id subject_id seconds result_id owner_id average duration_id deviation
0 7 1 0 1680.5 2.0 24.0 1.0 0
1 7 1 3600 1690.5 2.0 46.0 2.0 192
2 7 1 10800 1700.5 2.0 101.0 4.0 421
3 7 2 0 1682.5 2.0 12.5 1.0 0
4 7 2 3600 1692.5 2.0 33.5 2.0 268
5 7 2 10800 1702.5 2.0 86.5 4.0 692
6 7 3 0 1684.5 2.0 8.5 1.0 0
7 7 3 3600 1694.5 2.0 15.0 2.0 176
8 7 3 10800 1704.5 2.0 34.0 4.0 400