Home > database >  How to iterate within a data.table
How to iterate within a data.table

Time:09-10

I want to compute statistics that are grouped and cumulative such as the simulations below where I have 10 observations per day for 5 days and I compute the cumulative standard deviation for each day.

library(data.table)
library(tictoc)

DURATION <- 5
DAILY_N <- 10
N_PER_COND <- DURATION * DAILY_N

dt <- 
    data.table(
      day = rep(1:DURATION, each = DAILY_N),
      x = rgamma(n=N_PER_COND, shape=5, scale=25)
    )

cum_stdevs <- vector('double', DURATION)

tic()
for (i in seq_along(cum_stdevs)) {
    cum_x <- dt[day <= i, x]
    cum_stdevs[i] <- sd(cum_x)
}
toc()

Is there a way to perform this kind of operation within data.table without resorting to a for loop?

Even within the for loop, the speed improvement was 14x over using standard dataframes.

CodePudding user response:

I guess you can try sapply within data.table like below

cum_stdevs <- dt[, sapply(seq_along(cum_stdevs), function(k) sd(x[day <= k]))]
  • Related