Computation of average of a column on rolling in R data.table-CodePudding

library(data.table)
DT <- data.table(
      N = 1:16,
  x = c(11,11,11,11,11,11,11,11,21,21,21,21,21,21,21,21), 
  y = c(1,2,3,4,4,4,4,4,1,2,3,4,4,4,4,4), 
  z = c(53,71,27,64,43,62,61,85,44,56,23,37,31,48,80,38)
)

I want to get a column with values as average of z with respect to y for each x values like

N   x   y   z   Roll Mean 
1   11  1   53  NA
2   11  2   71  53
3   11  3   27  62
4   11  4   64  50.33333333
5   11  4   43  53.75
6   11  4   62  51.25
7   11  4   61  49
8   11  4   85  57.5
9   21  1   44  NA
10  21  2   56  44
11  21  3   23  50
12  21  4   37  41
13  21  4   31  40
14  21  4   48  36.75
15  21  4   80  34.75
16  21  4   38  49

for example when N=2 x=11, y=2 the roll mean = average of previous 1 z term = mean(53) when N=3, x=11, y=3 the roll mean = average of previous 1 and 2 z term = mean(53&71 ) when N=4 x=11, y=4 the roll mean = average of previous 1, 2 and 3rd z terms = mean(53,71,27 ) then N = 5 to 8 I have to get averages of previous 4 values. I wrote the code

DT[, RollingAvg := frollapply(z,4, mean), .(x)]

gives the output

    N  x y  z RollingAvg
1:  1 11 1 53         NA
2:  2 11 1 71         NA
3:  3 11 1 27         NA
4:  4 11 1 64      53.75
5:  5 11 1 43      51.25
6:  6 11 1 62      49.00
7:  7 11 1 61      57.50
8:  8 11 1 85      62.75
9:  9 21 1 44         NA
10: 10 21 1 56         NA
11: 11 21 1 23         NA
12: 12 21 1 37      40.00
13: 13 21 1 31      36.75
14: 14 21 1 48      34.75
15: 15 21 1 80      49.00
16: 16 21 1 38      49.25

How can I get the correct out put

CodePudding user response：

We may use rollapply with partial = TRUE

library(zoo)
library(data.table)
DT[, RollingAvg := shift(rollapply(z, 4, mean, 
    partial = TRUE, align = 'right')), by = x]

-output

> DT
        N     x     y     z RollingAvg
    <int> <num> <num> <num>      <num>
 1:     1    11     1 53.00         NA
 2:     2    11     2 71.00   53.00000
 3:     3    11     3 27.00   62.00000
 4:     4    11     4 64.00   50.33333
 5:     5    11     4 43.00   53.75000
 6:     6    11     4 62.00   51.25000
 7:     7    11     4 61.00   49.00000
 8:     8    11     1 85.00   57.50000
 9:     9    21     2 44.00         NA
10:    10    21     3 56.00   44.00000
11:    11    21     4 23.00   50.00000
12:    12    21     4 37.00   41.00000
13:    13    21     4 31.00   40.00000
14:    14    21     4 48.00   36.75000
15:    15    21     1 80.38   34.75000
16:    16    21     2 53.00   49.09500

In addition, frollmean can have adaptive option if we specify n as a vector of values

DT[, RollingAvg := shift(frollmean(z, rep(1:4, c(1, 1, 1, .N-3)), 
      adaptive = TRUE)), by = x]

-output

> DT
        N     x     y     z RollingAvg
    <int> <num> <num> <num>      <num>
 1:     1    11     1 53.00         NA
 2:     2    11     2 71.00   53.00000
 3:     3    11     3 27.00   62.00000
 4:     4    11     4 64.00   50.33333
 5:     5    11     4 43.00   53.75000
 6:     6    11     4 62.00   51.25000
 7:     7    11     4 61.00   49.00000
 8:     8    11     1 85.00   57.50000
 9:     9    21     2 44.00         NA
10:    10    21     3 56.00   44.00000
11:    11    21     4 23.00   50.00000
12:    12    21     4 37.00   41.00000
13:    13    21     4 31.00   40.00000
14:    14    21     4 48.00   36.75000
15:    15    21     1 80.38   34.75000
16:    16    21     2 53.00   49.09500

CodePudding user response：

DT[, rm := shift(as.data.frame(frollmean(z, 1:4))[cbind(1:.N, y)])]
# OR 
DT[, rm := shift(unlist(frollmean(z, 1:4))[.I   (y-1)*.N])]
DT[y == 1L, rm := NA_real_]
#         N     x     y     z       rm
#     <int> <num> <num> <num>    <num>
#  1:     1    11     1    53       NA
#  2:     2    11     2    71 53.00000
#  3:     3    11     3    27 62.00000
#  4:     4    11     4    64 50.33333
#  5:     5    11     4    43 53.75000
#  6:     6    11     4    62 51.25000
#  7:     7    11     4    61 49.00000
#  8:     8    11     4    85 57.50000
#  9:     9    21     1    44       NA
# 10:    10    21     2    56 44.00000
# 11:    11    21     3    23 50.00000
# 12:    12    21     4    37 41.00000
# 13:    13    21     4    31 40.00000
# 14:    14    21     4    48 36.75000
# 15:    15    21     4    80 34.75000
# 16:    16    21     4    38 49.00000

Properly reproducible data:

DT <- data.table(
  N = 1:16,
  x = c(11,11,11,11,11,11,11,11,21,21,21,21,21,21,21,21), 
  y = c(1,2,3,4,4,4,4,4,1,2,3,4,4,4,4,4), 
  z = c(53,71,27,64,43,62,61,85,44,56,23,37,31,48,80,38)
)