Bootstrap method for mixed glm zero-inflated model-CodePudding

I'd like to bootstrap a mixed glm zero-inflated model (m_F) using the glmmTMB package, but despite the use of coef or fixef for coefficients specification, I always have as output the error:

Error in bres[i, ] <- coef(bfit) : 
  incorrect number of subscripts on matrix

My example:

library(glmmTMB)
library(boot)
my.ds <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/ds.desenvol.csv")
str(my.ds)
# 'data.frame': 400 obs. of  4 variables:
#  $ temp       : num  0 0 0 0 0 0 0 0 0 0 ...
#  $ storage    : int  5 5 5 5 5 5 5 5 5 5 ...
#  $ rep        : chr  "r1" "r2" "r3" "r4" ...
#  $ development: int  0 23 22 27 24 25 24 22 0 22 ...

# Fit a GLM mixed Hurdle (zero-inflated) log-link Gamma model
m_F <- glmmTMB(development ~ poly(temp,2)   (1 | storage), data = my.ds,
               family = ziGamma(link = "log"),
               ziformula = ~ 1)
               summary(m_F)

# Create a bootstrap aproach
nboot <- 1000
bres <- matrix(NA,nrow=nboot,
                  ncol=length(coef(m_F)),
                  dimnames=list(rep=seq(nboot),
                                coef=names(coef(m_F))))
set.seed(1000)
bootsize <- 100
for (i in seq(nboot)) {
  bdat <- my.ds[sample(nrow(my.ds),size=bootsize,replace=TRUE),]
  bfit <- update(m_F, data=bdat)  ## refit with new data
  bres[i,] <- coef(bfit)
}

Please, any help wit it?

CodePudding user response：

My answer is somewhat similar to @RuiBarradas's, but closer to your original code. The main point is that coef() doesn't do what you think; (1) the convention (set originally by the nlme package) is that for mixed models coef() returns a matrix (or list of matrices) of group-level coefficients, while fixef() returns the fixed-effect (population-level) coefficients; (2) for glmmTMB, fixef() returns a list of fixed-effect vectors for the conditional, zero-inflation, and dispersion models (unlist() collapses this back to a vector with concatenated names).

The other point to keep in mind is that bootstrapping at the level of individual observations may not be sensible for a data set with grouping structure (you can bootstrap at the group level, or the within-group level, or both; you can bootstrap residuals (if you have a linear model - this won't work for GLMMs with count data); you can also use lme4::bootMer to do parametric bootstrapping, which is pretty much the only alternative when you have GLMMs with crossed random effects).

PS what is bootsize doing here? The standard approach to bootstrapping is to resample a data set the same size as the original with replacement. Resampling only a quarter of the data set (nrow(my.ds) == 400, bootsize == 100) is well-defined but very unusual — are you doing some particular non-standard kind of bootstrap on purpose ... ?

sum_fun <- function(fit) {
    unlist(fixef(fit))
}

bres <- matrix(NA,
               nrow=nboot,
               ncol=length(sum_fun(m_F)),
               dimnames=list(rep=seq(nboot),
                             coef=names(sum_fun(m_F))))
set.seed(1000)
bootsize <- 100
pb <- txtProgressBar(max = bootsize, style = 3)
for (i in seq(nboot)) {
    setTxtProgressBar(pb, i)
    bdat <- my.ds[sample(nrow(my.ds), size=bootsize,replace=TRUE),]
    bfit <- update(m_F, data=bdat)  ## refit with new data
    bres[i,] <- sum_fun(bfit)
}

CodePudding user response：

To use package boot, you must define a function that bootstraps the data and then computes the statistic or vector of statistics from it. This is function ziboot below. Then call boot passing it the data, the function and the number of replicates.

The function fits the same model as the question's code but must transform the model output in a vector of coefficients. That is what the lapply does.

library(glmmTMB)
library(boot)

my.ds <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/ds.desenvol.csv")

# Create a bootstrap aproach
# This function will be called by boot() below
ziboot <- function(data, i) {
  # this bootstraps the data
  d <- data[i, ]
  model <- glmmTMB(development ~ temp   (1 | storage), data = d,
                 family = ziGamma(link = "log"),
                 ziformula = ~ 1)
  cf <- coef(model)$cond$storage
  l <- as.list(cf)
  unlist(lapply(seq_along(l), \(i){
    x <- l[[i]]
    nms <- paste(names(l)[i], row.names(cf), sep = "_")
    setNames(x, nms)
  }))
}

set.seed(1000)
bootsize <- 100
b <- boot(my.ds, ziboot, R = bootsize)
colnames(b$t) <- names(b$t0)
head(b$t)
#>      (Intercept)_5 (Intercept)_10 (Intercept)_15 (Intercept)_20 (Intercept)_30
#> [1,]      3.156717       3.153949       3.139001       3.147799       3.196308
#> [2,]      3.172563       3.157384       3.164663       3.143005       3.196966
#> [3,]      3.175124       3.154946       3.158715       3.129027       3.168753
#> [4,]      3.149817       3.143550       3.135256       3.141367       3.167679
#> [5,]      3.159183       3.179388       3.147193       3.148219       3.237395
#> [6,]      3.148815       3.168335       3.117576       3.126973       3.178377
#>            temp_5      temp_10      temp_15      temp_20      temp_30
#> [1,] -0.004089067 -0.004089067 -0.004089067 -0.004089067 -0.004089067
#> [2,] -0.004404738 -0.004404738 -0.004404738 -0.004404738 -0.004404738
#> [3,] -0.003153053 -0.003153053 -0.003153053 -0.003153053 -0.003153053
#> [4,] -0.003547863 -0.003547863 -0.003547863 -0.003547863 -0.003547863
#> [5,] -0.003989763 -0.003989763 -0.003989763 -0.003989763 -0.003989763
#> [6,] -0.003137722 -0.003137722 -0.003137722 -0.003137722 -0.003137722

^{Created on 2022-07-05 by the reprex package (v2.0.1)}