Home > OS >  column mean for subset of rows in R data.table
column mean for subset of rows in R data.table

Time:10-19

I have a data.table like this:

example <- data.table(
  id=rep(1:2,each=9),
  day=rep(rep(1:3,each=3),2),
  time=rep(1:3,6),
  mean_day=rnorm(18)
)

I need to compute the mean across multiple days, but different ones between id's.

To get the mean across days 1 to 2 for the first individual I've tried the following (inspired by this post):

example[id==1, mean_over_days := mean(mean_day,na.rm=T), by=(day %in% 1:2)]

> example
    id day time    mean_day mean_over_days
 1:  1   1    1 -1.53685184     -0.8908466
 2:  1   1    2  0.77445521     -0.8908466
 3:  1   1    3 -0.56048917     -0.8908466
 4:  1   2    1 -1.78388960     -0.8908466
 5:  1   2    2  0.01787129     -0.8908466
 6:  1   2    3 -2.25617538     -0.8908466
 7:  1   3    1 -0.44886190     -0.0955414
 8:  1   3    2 -1.31086985     -0.0955414
 9:  1   3    3  1.47310754     -0.0955414
10:  2   1    1  0.53560356             NA
11:  2   1    2  1.16654996             NA
12:  2   1    3 -0.06704728             NA
13:  2   2    1 -0.83897719             NA
14:  2   2    2 -0.85209939             NA
15:  2   2    3 -0.41392341             NA
16:  2   3    1 -0.03014190             NA
17:  2   3    2  0.43835822             NA
18:  2   3    3 -1.62432188             NA

I want all the lines for id==1 of column mean_over_days to have the same value (-0.8908466), but it happens that for day 3 this column has the mean over that day only.

How can I change the code to correct this?

CodePudding user response:

Don't subset in by it would create different groups for TRUE and FALSE values.

library(data.table)
example[id==1, mean_over_days := mean(mean_day[day %in% 1:2],na.rm=TRUE)][]

#    id day time    mean_day mean_over_days
# 1:  1   1    1 -0.56047565      0.4471527
# 2:  1   1    2 -0.23017749      0.4471527
# 3:  1   1    3  1.55870831      0.4471527
# 4:  1   2    1  0.07050839      0.4471527
# 5:  1   2    2  0.12928774      0.4471527
# 6:  1   2    3  1.71506499      0.4471527
# 7:  1   3    1  0.46091621      0.4471527
# 8:  1   3    2 -1.26506123      0.4471527
# 9:  1   3    3 -0.68685285      0.4471527
#10:  2   1    1 -0.44566197             NA
#11:  2   1    2  1.22408180             NA
#12:  2   1    3  0.35981383             NA
#13:  2   2    1  0.40077145             NA
#14:  2   2    2  0.11068272             NA
#15:  2   2    3 -0.55584113             NA
#16:  2   3    1  1.78691314             NA
#17:  2   3    2  0.49785048             NA
#18:  2   3    3 -1.96661716             NA

data

set.seed(123)
example <- data.table(
  id=rep(1:2,each=9),
  day=rep(rep(1:3,each=3),2),
  time=rep(1:3,6),
  mean_day=rnorm(18)
)
  • Related