I want to compute statistics that are grouped and cumulative such as the simulations below where I have 10 observations per day for 5 days and I compute the cumulative standard deviation for each day.
library(data.table)
library(tictoc)
DURATION <- 5
DAILY_N <- 10
N_PER_COND <- DURATION * DAILY_N
dt <-
data.table(
day = rep(1:DURATION, each = DAILY_N),
x = rgamma(n=N_PER_COND, shape=5, scale=25)
)
cum_stdevs <- vector('double', DURATION)
tic()
for (i in seq_along(cum_stdevs)) {
cum_x <- dt[day <= i, x]
cum_stdevs[i] <- sd(cum_x)
}
toc()
Is there a way to perform this kind of operation within data.table without resorting to a for loop?
Even within the for loop, the speed improvement was 14x over using standard dataframes.
CodePudding user response:
I guess you can try sapply
within data.table
like below
cum_stdevs <- dt[, sapply(seq_along(cum_stdevs), function(k) sd(x[day <= k]))]