Home > database >  r survey package: Geometric means with confidence intervals
r survey package: Geometric means with confidence intervals

Time:02-03

I am roughly reproducing this code:

library(haven)
library(survey)
library(dplyr)

nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))

# Rename variables into something more readable
nhanesDemo$fpl <- nhanesDemo$INDFMPIR
nhanesDemo$age <- nhanesDemo$RIDAGEYR
nhanesDemo$gender <- nhanesDemo$RIAGENDR
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
nhanesDemo$psu <- nhanesDemo$SDMVPSU
nhanesDemo$strata <- nhanesDemo$SDMVSTRA

# Select the necessary columns
nhanesAnalysis <- nhanesDemo %>%
  select(fpl, age, gender, persWeight, psu, strata)

# Set up the design
nhanesDesign <- svydesign(id      = ~psu,
                          strata  = ~strata,
                          weights = ~persWeight,
                          nest    = TRUE,
                          data    = nhanesAnalysis)

# Select those between the agest of 18 and 79
ageDesign <- subset(nhanesDesign, age > 17 & age < 80)

They calculate a simple arithmetic mean as follows:

# Arithmetic mean
svymean(~age, ageDesign, na.rm = TRUE)

I would like to calculate (1) the geometric mean using svymean or some related method (2) with a 95% confidence interval of the geometric mean without having to manually use the SE, if possible. I tried

svymean(~log(age), log(ageDesign), na.rm = TRUE)

but that throws an error. How do I do this?

CodePudding user response:

You want to take the logarithm of the variable, not of the whole survey design (which doesn't make any sense)

Here's how to compute the mean of log(age), get a confidence interval, and then exponentiate it to the geometric mean scale

> svymean(~log(age), ageDesign, na.rm = TRUE)
           mean     SE
log(age) 3.7489 0.0115
> meanlog<-svymean(~log(age), ageDesign, na.rm = TRUE)
> confint(meanlog)
            2.5 %   97.5 %
log(age) 3.726351 3.771372
> exp(confint(meanlog))
            2.5 %   97.5 %
log(age) 41.52729 43.43963

Or, you can exponentiate first and then construct the confidence interval:

> (geomean<-svycontrast(meanlog, quote(exp(`log(age)`))))
          nlcon     SE
contrast 42.473 0.4878
> confint(geomean)
            2.5 %   97.5 %
contrast 41.51661 43.42879

I would expect the first approach to give better confidence intervals in general, but it makes almost no difference here.

  • Related