I'd like to bootstrap a mixed glm zero-inflated model (m_F
) using the glmmTMB
package, but despite the use of coef
or fixef
for coefficients specification, I always have as output the error:
Error in bres[i, ] <- coef(bfit) :
incorrect number of subscripts on matrix
My example:
library(glmmTMB)
library(boot)
my.ds <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/ds.desenvol.csv")
str(my.ds)
# 'data.frame': 400 obs. of 4 variables:
# $ temp : num 0 0 0 0 0 0 0 0 0 0 ...
# $ storage : int 5 5 5 5 5 5 5 5 5 5 ...
# $ rep : chr "r1" "r2" "r3" "r4" ...
# $ development: int 0 23 22 27 24 25 24 22 0 22 ...
# Fit a GLM mixed Hurdle (zero-inflated) log-link Gamma model
m_F <- glmmTMB(development ~ poly(temp,2) (1 | storage), data = my.ds,
family = ziGamma(link = "log"),
ziformula = ~ 1)
summary(m_F)
# Create a bootstrap aproach
nboot <- 1000
bres <- matrix(NA,nrow=nboot,
ncol=length(coef(m_F)),
dimnames=list(rep=seq(nboot),
coef=names(coef(m_F))))
set.seed(1000)
bootsize <- 100
for (i in seq(nboot)) {
bdat <- my.ds[sample(nrow(my.ds),size=bootsize,replace=TRUE),]
bfit <- update(m_F, data=bdat) ## refit with new data
bres[i,] <- coef(bfit)
}
Please, any help wit it?
CodePudding user response:
My answer is somewhat similar to @RuiBarradas's, but closer to your original code. The main point is that coef()
doesn't do what you think; (1) the convention (set originally by the nlme
package) is that for mixed models coef()
returns a matrix (or list of matrices) of group-level coefficients, while fixef()
returns the fixed-effect (population-level) coefficients; (2) for glmmTMB
, fixef()
returns a list of fixed-effect vectors for the conditional, zero-inflation, and dispersion models (unlist()
collapses this back to a vector with concatenated names).
The other point to keep in mind is that bootstrapping at the level of individual observations may not be sensible for a data set with grouping structure (you can bootstrap at the group level, or the within-group level, or both; you can bootstrap residuals (if you have a linear model - this won't work for GLMMs with count data); you can also use lme4::bootMer
to do parametric bootstrapping, which is pretty much the only alternative when you have GLMMs with crossed random effects).
PS what is bootsize
doing here? The standard approach to bootstrapping is to resample a data set the same size as the original with replacement. Resampling only a quarter of the data set (nrow(my.ds) == 400
, bootsize == 100
) is well-defined but very unusual — are you doing some particular non-standard kind of bootstrap on purpose ... ?
sum_fun <- function(fit) {
unlist(fixef(fit))
}
bres <- matrix(NA,
nrow=nboot,
ncol=length(sum_fun(m_F)),
dimnames=list(rep=seq(nboot),
coef=names(sum_fun(m_F))))
set.seed(1000)
bootsize <- 100
pb <- txtProgressBar(max = bootsize, style = 3)
for (i in seq(nboot)) {
setTxtProgressBar(pb, i)
bdat <- my.ds[sample(nrow(my.ds), size=bootsize,replace=TRUE),]
bfit <- update(m_F, data=bdat) ## refit with new data
bres[i,] <- sum_fun(bfit)
}
CodePudding user response:
To use package boot
, you must define a function that bootstraps the data and then computes the statistic or vector of statistics from it. This is function ziboot
below. Then call boot
passing it the data, the function and the number of replicates.
The function fits the same model as the question's code but must transform the model output in a vector of coefficients. That is what the lapply
does.
library(glmmTMB)
library(boot)
my.ds <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/ds.desenvol.csv")
# Create a bootstrap aproach
# This function will be called by boot() below
ziboot <- function(data, i) {
# this bootstraps the data
d <- data[i, ]
model <- glmmTMB(development ~ temp (1 | storage), data = d,
family = ziGamma(link = "log"),
ziformula = ~ 1)
cf <- coef(model)$cond$storage
l <- as.list(cf)
unlist(lapply(seq_along(l), \(i){
x <- l[[i]]
nms <- paste(names(l)[i], row.names(cf), sep = "_")
setNames(x, nms)
}))
}
set.seed(1000)
bootsize <- 100
b <- boot(my.ds, ziboot, R = bootsize)
colnames(b$t) <- names(b$t0)
head(b$t)
#> (Intercept)_5 (Intercept)_10 (Intercept)_15 (Intercept)_20 (Intercept)_30
#> [1,] 3.156717 3.153949 3.139001 3.147799 3.196308
#> [2,] 3.172563 3.157384 3.164663 3.143005 3.196966
#> [3,] 3.175124 3.154946 3.158715 3.129027 3.168753
#> [4,] 3.149817 3.143550 3.135256 3.141367 3.167679
#> [5,] 3.159183 3.179388 3.147193 3.148219 3.237395
#> [6,] 3.148815 3.168335 3.117576 3.126973 3.178377
#> temp_5 temp_10 temp_15 temp_20 temp_30
#> [1,] -0.004089067 -0.004089067 -0.004089067 -0.004089067 -0.004089067
#> [2,] -0.004404738 -0.004404738 -0.004404738 -0.004404738 -0.004404738
#> [3,] -0.003153053 -0.003153053 -0.003153053 -0.003153053 -0.003153053
#> [4,] -0.003547863 -0.003547863 -0.003547863 -0.003547863 -0.003547863
#> [5,] -0.003989763 -0.003989763 -0.003989763 -0.003989763 -0.003989763
#> [6,] -0.003137722 -0.003137722 -0.003137722 -0.003137722 -0.003137722
Created on 2022-07-05 by the reprex package (v2.0.1)