I have been using glm to run logistic regressions. However, I need to use svyglm because my weights are probability weights. However, I really don't understand the documentation for svyglm.
As an example, I am using this data setup, which should give a pweight that suggests this is a perfectly representative sample:
data("mtcars")
mtcars$pweight <- ifelse(mtcars$cyl == 4, (11/32), ifelse(mtcars$cyl == 6, (7/32), (14/32)))
mtcars$goodcar <- rbinom(n=32, size=1, prob=0.3)
If I use the unweighted glm command for the regression, I get the following result:
mylogit <- glm(goodcar ~ mpg disp hp, data = mtcars, family = "binomial", weights = pweight)
summary(mylogit)
call:
glm(formula = goodcar ~ mpg disp hp, family = "binomial",
data = mtcars)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6633 -0.7404 -0.4651 0.8658 1.9408
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.279806 4.800398 -0.058 0.9535
mpg 0.016486 0.145431 0.113 0.9097
disp 0.020622 0.009257 2.228 0.0259 *
hp -0.041091 0.021168 -1.941 0.0522 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 39.75 on 31 degrees of freedom
Residual deviance: 31.27 on 28 degrees of freedom
AIC: 39.27
Number of Fisher Scoring iterations: 5
Based on the above, I tried to set it up like this:
design <- svydesign(
id = ~0 , #no clusters
data = mtcars ,
weights = ~pweight
)
mysvyglm <- svyglm(goodcar ~ mpg disp hp, data = mtcars, design, family = "binomial")
summary(mysvyglm)
Call:
svyglm(formula = goodcar ~ mpg disp hp, design = design,
family = "binomial", data = mtcars)
Survey design:
svydesign(id = ~0, data = mtcars, weights = ~pweight)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.756867 4.655046 0.163 0.8720
mpg -0.019180 0.150333 -0.128 0.8994
disp 0.017538 0.008974 1.954 0.0607 .
hp -0.037779 0.017197 -2.197 0.0365 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 0.8997231)
Number of Fisher Scoring iterations: 5
Am I wrong to think that the glm and the svyglm should be the same in this case, as the pweights show the sample is representative? I feel like I'm missing something and getting this setup wrong.
CodePudding user response:
The mtcars dataset is small: it has only 32 observations. Let's generate a larger dataset. The simulation is simple and doesn't take the inverse weight sampling into account.
And I've added a third model, geepack::geeglm
, that can fit a GLM with inverse probability weighting.
With the larger dataset, all three models have the same estimates for the coefficients and their standard errors.
Part 1. Generate data
library("survey")
library("geepack")
library("tidyverse")
set.seed(1234)
inv_logit <- function(x) {
plogis(x)
}
n <- 100
# Gnerate dataset
mydata <-
tibble(
id = seq(n),
x1 = rnorm(n),
x2 = rnorm(n),
x3 = rnorm(n),
y = rbinom(n,
size = 1,
prob = inv_logit(0.1 * x1 0.2 * x2 0.3 * x3) rnorm(n, sd = 0.1)
),
pweight = runif(n)
) %>%
# geeglm requires the data is ordered by time (or observation) within each cluster
arrange(id)
Part 2. Fit models
fit.glm <- glm(
y ~ x1 x2 x3,
weights = pweight,
family = "binomial",
data = mydata
)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
tidy(fit.glm)
#> # A tibble: 4 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -0.319 0.340 -0.938 0.348
#> 2 x1 0.638 0.362 1.77 0.0776
#> 3 x2 0.591 0.372 1.59 0.112
#> 4 x3 0.972 0.446 2.18 0.0291
design <- svydesign(
id = ~1,
weights = ~pweight,
data = mydata
)
fit.svyglm <- svyglm(
y ~ x1 x2 x3,
design = design,
family = "binomial",
data = mydata
)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
tidy(fit.svyglm)
#> # A tibble: 4 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -0.319 0.269 -1.18 0.239
#> 2 x1 0.638 0.293 2.18 0.0317
#> 3 x2 0.591 0.297 1.99 0.0497
#> 4 x3 0.972 0.329 2.95 0.00397
fit.geeglm <- geeglm(
y ~ x1 x2 x3,
id = mydata$id,
weights = pweight,
family = "binomial",
corstr = "independence",
data = mydata
)
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
tidy(fit.geeglm)
#> # A tibble: 4 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -0.319 0.268 1.42 0.234
#> 2 x1 0.638 0.291 4.80 0.0284
#> 3 x2 0.591 0.296 3.99 0.0457
#> 4 x3 0.972 0.328 8.80 0.00301
Created on 2022-03-16 by the reprex package (v2.0.1)