This previous question "How to repeatedly perform glm over multiple dependent variables after mice?" - did not work for me. I don't understand how it incorporates the pooling of the mids.
######################
I need to repeat logistic regression for 7 dependent variables, using the same 5 predictor variables for each regression.
I need to do multiple imputation using mice before I perform the regressions.
I then need an output of exponentiated odds ratios and confidence intervals for each of the dependent variables.
Here is a smaller example:
P1 <- c(7,8,5,NA)
P2 <- c(10,12,11,12)
P3 <- c(1,1,0,1)
P4 <- c(1,1,1,0)
R1 <- c(0,1,0,1)
R2 <- c(0,0,1,0)
R3 <- c(1,1,0,0)
R4 <- c(0,1,1,1)
R5 <- c(1,0,1,0)
df1 <- data.frame(P1,P2,P3,P4,R1,R2,R3,R4,R5)
cols <- c("P3","P4","R1","R2","R3","R4","R5")
df1[cols] <- lapply(df1[cols], factor)
Multiple Imputation:
library(mice)
imp <- mice(df1, maxit=0)
predM <- imp$predictorMatrix
meth <- imp$method
imp2 <- mice(df1, maxit=5,
predictorMatrix = predM,
method = meth, print = F)
Logistic Regression after pooling imputed data:
m1.mi <- with(imp2, glm(R1 ~ P1 P2 P3 P4, family=binomial(link=logit)))
summary(pool(m1.mi),exponentiate=TRUE, conf.int = TRUE)
Is there a way to create a loop, that will produce an odds ratio and CIs for each dependent variable?
Any help appreciated, I apologise for my very limited knowledge in this area.
CodePudding user response:
reformulate
is neat for jobs like this. Define the process of one iteration in an anonymous function (those with function(x)
or abbreviated \(x)
) and put it into lapply
. The summary
can be subsetted to desired columns just like a data frame using the respective column names
in brackets.
Let's use the nhanes
data coming with mice
in an example.
dvs <- c('age1', 'age2', 'age3')
res <- lapply(dvs, \(x) {
mi <- with(imp, glm(reformulate(c('bmi', 'hyp', 'chl'), x), family=binomial))
s <- summary(pool(mi), exponentiate=TRUE, conf.int=TRUE)
`rownames<-`(s[c('estimate', '2.5 %', '97.5 %')], s$term) |> as.matrix()
}) |> setNames(dvs)
This gives a named list like this one,
res
# $age1
# estimate 2.5 % 97.5 %
# (Intercept) 2.197201e 04 0.0000000 Inf
# bmi 1.999346e 00 0.4691464 8.520549
# hyp 4.331347e-08 0.0000000 Inf
# chl 9.441578e-01 0.8403936 1.060734
#
# $age2
# estimate 2.5 % 97.5 %
# (Intercept) 0.9300514 0.0002224722 3888.105740
# bmi 0.8891490 0.6550513927 1.206907
# hyp 2.8818840 0.2054892623 40.416981
# chl 1.0046189 0.9757483655 1.034344
#
# $age3
# estimate 2.5 % 97.5 %
# (Intercept) 3.5383401 1.087542e-07 1.151206e 08
# bmi 0.6442681 2.792523e-01 1.486403e 00
# hyp 5.9651279 5.954159e-02 5.976117e 02
# chl 1.0323994 9.885890e-01 1.078151e 00
where you can output individual elements in this way:
res$age1
# estimate 2.5 % 97.5 %
# (Intercept) 2.197201e 04 0.0000000 Inf
# bmi 1.999346e 00 0.4691464 8.520549
# hyp 4.331347e-08 0.0000000 Inf
# chl 9.441578e-01 0.8403936 1.060734
Where
class(res$age1)
# [1] "matrix" "array"
If you want class "data.frame"
in the list elements, just leave out the |> as.matrix()
part.
Data:
data("nhanes", package='mice')
nhanes <- transform(nhanes, age1=age == 1, age2=age == 2, age3=age == 3)
library(mice)
imp0 <- mice(nhanes, maxit=0)
imp <- mice(nhanes, maxit=5,
predictorMatrix=imp0$predictorMatrix,
method=imp0$method, print=F)
CodePudding user response:
You could do something like:
model_list <- lapply( paste0("R",1:5),function(dep_var){
formula <- as.formula(paste0(dep_var,"~ P1 P2 P3 P4"))
m1.mi <- with(imp2, glm(formula, family=binomial(link=logit)))
summary(pool(m1.mi),exponentiate=TRUE, conf.int = TRUE) %>%
mutate(depvar = dep_var)
})
do.call(rbind,model_list)
term estimate std.error statistic df p.value 2.5 % 97.5 % depvar
1 (Intercept) 1.704353e-121 621431.55 -4.474823e-04 0 0.9999859 NaN NaN R1
2 P1 1.241210e 04 38928.27 2.421486e-04 0 0.9999923 NaN NaN R1
3 P2 1.540603e 08 59463.92 3.170469e-04 0 0.9999900 NaN NaN R1
4 P3 NA NA NA NA NA NA NA R1
5 P4 NA NA NA NA NA NA NA R1
6 (Intercept) 1.382828e 06 621431.49 2.275334e-05 0 0.9999993 NaN NaN R2
7 P1 6.490963e-09 38928.28 -4.842972e-04 0 0.9999847 NaN NaN R2
8 P2 1.241211e 04 59463.92 1.585235e-04 0 0.9999950 NaN NaN R2
9 P3 NA NA NA NA NA NA NA R2
10 P4 NA NA NA NA NA NA NA R2
11 (Intercept) 7.231535e-07 621431.51 -2.275334e-05 0 0.9999993 NaN NaN R3
12 P1 1.540603e 08 38928.27 4.842972e-04 0 0.9999847 NaN NaN R3
13 P2 8.056654e-05 59463.92 -1.585235e-04 0 0.9999950 NaN NaN R3
14 P3 NA NA NA NA NA NA NA R3
15 P4 NA NA NA NA NA NA NA R3
16 (Intercept) 4.045223e-105 621431.46 -3.868068e-04 0 0.9999878 NaN NaN R4
17 P1 8.056652e-05 38928.27 -2.421486e-04 0 0.9999923 NaN NaN R4
18 P2 1.912213e 12 59463.92 4.755705e-04 0 0.9999850 NaN NaN R4
19 P3 NA NA NA NA NA NA NA R4
20 P4 NA NA NA NA NA NA NA R4
21 (Intercept) 5.867312e 120 621431.46 4.474823e-04 0 0.9999859 NaN NaN R5
22 P1 8.056650e-05 38928.28 -2.421486e-04 0 0.9999923 NaN NaN R5
23 P2 6.490965e-09 59463.92 -3.170470e-04 0 0.9999900 NaN NaN R5
24 P3 NA NA NA NA NA NA NA R5
25 P4 NA NA NA NA NA NA NA R5
Which produces a long format of all your analysis. The trick is only creating your regression formula with the as.formula
function. I use lapply
so it produces a list, which is easier to handle. the do.call
just bind the results together (and that is why I add what is the dependent variable in your summary)