Deviations from group mean by id-CodePudding

I have some data where time is nested within individuals:

set.seed(124)
x = rnorm(25)
data.frame(id=rep(1:5, each=5), time=1:5, x=x)

What would be a base R solution to append a column that calculates deviations of each observation from the same person's average across time (i.e., centering around the person’s mean)? The output should look like this (x.c is the appended column that calculates the deviations from the person's mean):

   id time           x           x.c
1   1    1 -1.38507062  3.814056e-07
2   1    2  0.03832318  1.423394e 00
3   1    3 -0.76303016  6.220408e-01
4   1    4  0.21230614  1.597377e 00
5   1    5  1.42553797  2.810609e 00
6   2    1  0.74447982  2.233398e-08
7   2    2  0.70022940 -4.425040e-02
8   2    3 -0.22935461 -9.738344e-01
9   2    4  0.19709386 -5.473859e-01
10  2    5  1.20715377  4.626740e-01
11  3    1  0.31833673  2.642477e-08
12  3    2 -1.42379885 -1.742136e 00
13  3    3 -0.40509086 -7.234276e-01
14  3    4  0.99538657  6.770499e-01
15  3    5  0.95881779  6.404811e-01
16  4    1  0.91808790 -3.680049e-09
17  4    2 -0.15096960 -1.069058e 00
18  4    3 -1.22306879 -2.141157e 00
19  4    4 -0.86882429 -1.786912e 00
20  4    5 -1.04248536 -1.960573e 00
21  5    1 -1.10363778  2.169331e-07
22  5    2  0.44418506  1.547823e 00
23  5    3 -0.20495061  8.986874e-01
24  5    4  1.67563243  2.779270e 00
25  5    5 -0.13132225  9.723158e-01

I know the tidyverse solution is group_by but I would like a base R solution. Thank you!

CodePudding user response：

Here is an alternative base R approach using aggregate:

df1 <- merge(df, aggregate(x ~ id, data = df, mean), 
      by = "id", suffixes = c("", "mean"))

df1$x.c <- df1$x - df1$xmean
df1[-4]

   id time           x        x.c
1   1    1 -1.38507062 -1.2906839
2   1    2  0.03832318  0.1327099
3   1    3 -0.76303016 -0.6686435
4   1    4  0.21230614  0.3066928
5   1    5  1.42553797  1.5199247
6   2    1  0.74447982  0.2205594
7   2    2  0.70022940  0.1763090
8   2    3 -0.22935461 -0.7532751
9   2    4  0.19709386 -0.3268266
10  2    5  1.20715377  0.6832333
11  3    1  0.31833673  0.2296065
12  3    2 -1.42379885 -1.5125291
13  3    3 -0.40509086 -0.4938211
14  3    4  0.99538657  0.9066563
15  3    5  0.95881779  0.8700875
16  4    1  0.91808790  1.3915399
17  4    2 -0.15096960  0.3224824
18  4    3 -1.22306879 -0.7496168
19  4    4 -0.86882429 -0.3953723
20  4    5 -1.04248536 -0.5690333
21  5    1 -1.10363778 -1.2396192
22  5    2  0.44418506  0.3082037
23  5    3 -0.20495061 -0.3409320
24  5    4  1.67563243  1.5396511
25  5    5 -0.13132225 -0.2673036

CodePudding user response：

A base R solution would be to get the mean by 'id' with ave and subtract from the individual observations of 'x'

df1$x.c <- with(df1, x - ave(x, id))