How to get emmeans to print degrees of freedom for glmer class-CodePudding

I'm trying to get the degrees of freedom from emmeans of a glmer model for reporting reasons, but they just show Inf.

Here's some sample data. In the real data, there is no nesting structure, this is just a consequence of how I built the data frame:

set.seed(1234)
dat <- data.frame(
  dv=c(rnorm(mean=1, sd=0.2, n=12000)), 
  id=c(rep(c("1", "2", "3"), times=c(4000, 4000, 4000))), 
  region=c(rep(rep(c("1", "2"), times=c(2000, 2000)), 3)), 
  intervention=c(rep(c("1", "2", "1"), times=c(4000, 4000, 4000))), 
  timepoint=c(rep(rep(c("1", "2"), times=c(2000, 2000)), times=3)), 
  direction=c(rep(rep(c("1", "2"), times=c(2000, 2000)), 3))
)

glmm_1 <- glmer(dv ~ intervention*timepoint*region   direction   (1|id), data=dat, family=gaussian(link="log"))

glmm_1_emm <- emmeans::emmeans(glmm_1, pairwise ~ intervention*region*timepoint, type = "response")

glmm_1_emm$emmeans

NOTE: A nesting structure was detected in the fitted model:
    timepoint %in% (direction*region), region %in% direction
 region timepoint direction intervention response      SE  df asymp.LCL asymp.UCL
 1      1         1         1                   1 0.00313 Inf     0.994      1.01
 2      2         2         1                   1 0.00313 Inf     0.998      1.01
 1      1         1         2                   1 0.00442 Inf     0.992      1.01
 2      2         2         2                   1 0.00442 Inf     0.995      1.01

Confidence level used: 0.95 
Intervals are back-transformed from the log scale

CodePudding user response：

This is really more of a statistical (i.e. for CrossValidated) than a computational question. tl;dr finite-size corrections are rarely considered for GLMs or GLMMs, and for GLMMs in particular there is little theoretical work I'm aware of that would even specify how to compute them. That's why emmeans etc. report df as Inf.

df in emmeans output represents the "denominator degrees of freedom" (i.e. the nu2 value you would use if testing against an F distribution F_{nu1,nu2}), which is something like (number of observations - number of parameters estimated) for simple (non-mixed) models like a linear regression or simple ANOVA, but which is considerably harder to define for multilevel models (i.e. linear mixed models). For generalized linear (and linear mixed) models, it gets even worse. Quoting from the "degrees of freedom" section of the GLMM FAQ (see there for full references):

When the responses are not normally distributed (as in GLMs and GLMMs), and when the scale parameter is not estimated (as in standard Poisson- and binomial-response models), then the deviance differences are only asymptotically F- or chi-square-distributed (i.e. not for our real, finite-size samples). In standard GLM practice, we usually ignore this problem; there is some literature on finite-size corrections for GLMs under the rubrics of “Bartlett corrections” and “higher order asymptotics” (see McCullagh and Nelder (1989), Cordeiro, Paula, and Botter (1994), Cordeiro and Ferrari (1998) and the cond package (on CRAN) [which works with GLMs, not GLMMs]), but it’s rarely used. (The bias correction/Firth approach implemented in the brglm package attempts to address the problem of finite-size bias, not finite-size non-chi-squaredness of the deviance differences.)

When the scale parameter in a GLM is estimated rather than fixed (as in Gamma or quasi-likelihood models), it is sometimes recommended to use an F test to account for the uncertainty of the scale parameter (e.g. Venables and Ripley (2002) recommend anova(...,test="F") for quasi-likelihood models) Combining these issues, one has to look pretty hard for information on small-sample or finite-size corrections for GLMMs: Feng, Braun, and McCulloch (2004) and Bell and Grunwald (2010) look like good starting points, but it’s not at all trivial.

CodePudding user response：

Apparently emmeans::emmeans calculates Inf degrees of freedom, not sure why. But I've spotted this:

str(glmm_1_emm$emmeans)
# 'emmGrid' object with variables:
#   region = 1, 2
# timepoint = 1, 2
# direction = 1, 2
# intervention = 1, 2
# Nesting structure:  timepoint %in% (direction*region), region %in% direction
# Transformation: “log” 
# Some things are non-estimable (null space dim = 5)  ## <--------------------- !!!

There's a summary method involved, emmeans:::summary.emmGrid, since the summary is not calculated until you print it, i.e. evaluate glmm_1_emm$emmeans.

Provided that the degrees of freedom are correct later on, then you could extract them using a rather artless capture.output approach:

tmp <- capture.output(glmm_1_emm$emmeans)
res <- read.table(text=tmp[1:(which(tmp == '') - 1)], header=TRUE)
res
#   region timepoint direction intervention response      SE  df asymp.LCL asymp.UCL
# 1      1         1         1            1        1 0.00313 Inf     0.994      1.01
# 2      2         2         2            1        1 0.00313 Inf     0.998      1.01
# 3      1         1         1            2        1 0.00442 Inf     0.992      1.01
# 4      2         2         2            2        1 0.00442 Inf     0.995      1.01

res[, 7]  ## degrees of freedom
# [1] Inf Inf Inf Inf