Home > Enterprise >  Creating a loop for multiple dependent variables in logistic regression using a multiple imputation
Creating a loop for multiple dependent variables in logistic regression using a multiple imputation

Time:07-01

This previous question "How to repeatedly perform glm over multiple dependent variables after mice?" - did not work for me. I don't understand how it incorporates the pooling of the mids.

######################

I need to repeat logistic regression for 7 dependent variables, using the same 5 predictor variables for each regression.

I need to do multiple imputation using mice before I perform the regressions.

I then need an output of exponentiated odds ratios and confidence intervals for each of the dependent variables.

Here is a smaller example:

 P1 <- c(7,8,5,NA)
 P2 <- c(10,12,11,12)
 P3 <- c(1,1,0,1)
 P4 <- c(1,1,1,0)

 R1 <- c(0,1,0,1)
 R2 <- c(0,0,1,0)
 R3 <- c(1,1,0,0)
 R4 <- c(0,1,1,1)
 R5 <- c(1,0,1,0)

df1 <- data.frame(P1,P2,P3,P4,R1,R2,R3,R4,R5)
cols <- c("P3","P4","R1","R2","R3","R4","R5")

 df1[cols] <- lapply(df1[cols], factor)

Multiple Imputation:

library(mice)
imp <- mice(df1, maxit=0)

predM <- imp$predictorMatrix

meth <- imp$method

imp2 <- mice(df1, maxit=5, 
         predictorMatrix = predM,
         method = meth, print = F)

Logistic Regression after pooling imputed data:

m1.mi <- with(imp2, glm(R1 ~ P1   P2   P3   P4, family=binomial(link=logit)))

summary(pool(m1.mi),exponentiate=TRUE, conf.int = TRUE)

Is there a way to create a loop, that will produce an odds ratio and CIs for each dependent variable?

Any help appreciated, I apologise for my very limited knowledge in this area.

CodePudding user response:

reformulate is neat for jobs like this. Define the process of one iteration in an anonymous function (those with function(x) or abbreviated \(x)) and put it into lapply. The summary can be subsetted to desired columns just like a data frame using the respective column names in brackets.

Let's use the nhanes data coming with mice in an example.

dvs <- c('age1', 'age2', 'age3')

res <- lapply(dvs, \(x) {
  mi <- with(imp, glm(reformulate(c('bmi', 'hyp', 'chl'), x), family=binomial))
  s <- summary(pool(mi), exponentiate=TRUE, conf.int=TRUE)
  `rownames<-`(s[c('estimate', '2.5 %', '97.5 %')], s$term) |> as.matrix()
}) |> setNames(dvs)

This gives a named list like this one,

res
# $age1
#                 estimate     2.5 %   97.5 %
# (Intercept) 2.197201e 04 0.0000000      Inf
# bmi         1.999346e 00 0.4691464 8.520549
# hyp         4.331347e-08 0.0000000      Inf
# chl         9.441578e-01 0.8403936 1.060734
# 
# $age2
#              estimate        2.5 %      97.5 %
# (Intercept) 0.9300514 0.0002224722 3888.105740
# bmi         0.8891490 0.6550513927    1.206907
# hyp         2.8818840 0.2054892623   40.416981
# chl         1.0046189 0.9757483655    1.034344
# 
# $age3
#              estimate        2.5 %       97.5 %
# (Intercept) 3.5383401 1.087542e-07 1.151206e 08
# bmi         0.6442681 2.792523e-01 1.486403e 00
# hyp         5.9651279 5.954159e-02 5.976117e 02
# chl         1.0323994 9.885890e-01 1.078151e 00

where you can output individual elements in this way:

res$age1
#                 estimate     2.5 %   97.5 %
# (Intercept) 2.197201e 04 0.0000000      Inf
# bmi         1.999346e 00 0.4691464 8.520549
# hyp         4.331347e-08 0.0000000      Inf
# chl         9.441578e-01 0.8403936 1.060734

Where

class(res$age1)
# [1] "matrix" "array" 

If you want class "data.frame" in the list elements, just leave out the |> as.matrix() part.


Data:

data("nhanes", package='mice')
nhanes <- transform(nhanes, age1=age == 1, age2=age == 2, age3=age == 3)
library(mice)
imp0 <- mice(nhanes, maxit=0)
imp <- mice(nhanes, maxit=5, 
            predictorMatrix=imp0$predictorMatrix,
            method=imp0$method, print=F)

CodePudding user response:

You could do something like:

model_list <- lapply( paste0("R",1:5),function(dep_var){
  formula <- as.formula(paste0(dep_var,"~ P1   P2   P3   P4"))
  m1.mi <- with(imp2, glm(formula, family=binomial(link=logit)))
  
  summary(pool(m1.mi),exponentiate=TRUE, conf.int = TRUE) %>%
    mutate(depvar = dep_var)
})

do.call(rbind,model_list)

          term      estimate std.error     statistic df   p.value 2.5 % 97.5 % depvar
1  (Intercept) 1.704353e-121 621431.55 -4.474823e-04  0 0.9999859   NaN    NaN     R1
2           P1  1.241210e 04  38928.27  2.421486e-04  0 0.9999923   NaN    NaN     R1
3           P2  1.540603e 08  59463.92  3.170469e-04  0 0.9999900   NaN    NaN     R1
4           P3            NA        NA            NA NA        NA    NA     NA     R1
5           P4            NA        NA            NA NA        NA    NA     NA     R1
6  (Intercept)  1.382828e 06 621431.49  2.275334e-05  0 0.9999993   NaN    NaN     R2
7           P1  6.490963e-09  38928.28 -4.842972e-04  0 0.9999847   NaN    NaN     R2
8           P2  1.241211e 04  59463.92  1.585235e-04  0 0.9999950   NaN    NaN     R2
9           P3            NA        NA            NA NA        NA    NA     NA     R2
10          P4            NA        NA            NA NA        NA    NA     NA     R2
11 (Intercept)  7.231535e-07 621431.51 -2.275334e-05  0 0.9999993   NaN    NaN     R3
12          P1  1.540603e 08  38928.27  4.842972e-04  0 0.9999847   NaN    NaN     R3
13          P2  8.056654e-05  59463.92 -1.585235e-04  0 0.9999950   NaN    NaN     R3
14          P3            NA        NA            NA NA        NA    NA     NA     R3
15          P4            NA        NA            NA NA        NA    NA     NA     R3
16 (Intercept) 4.045223e-105 621431.46 -3.868068e-04  0 0.9999878   NaN    NaN     R4
17          P1  8.056652e-05  38928.27 -2.421486e-04  0 0.9999923   NaN    NaN     R4
18          P2  1.912213e 12  59463.92  4.755705e-04  0 0.9999850   NaN    NaN     R4
19          P3            NA        NA            NA NA        NA    NA     NA     R4
20          P4            NA        NA            NA NA        NA    NA     NA     R4
21 (Intercept) 5.867312e 120 621431.46  4.474823e-04  0 0.9999859   NaN    NaN     R5
22          P1  8.056650e-05  38928.28 -2.421486e-04  0 0.9999923   NaN    NaN     R5
23          P2  6.490965e-09  59463.92 -3.170470e-04  0 0.9999900   NaN    NaN     R5
24          P3            NA        NA            NA NA        NA    NA     NA     R5
25          P4            NA        NA            NA NA        NA    NA     NA     R5

Which produces a long format of all your analysis. The trick is only creating your regression formula with the as.formula function. I use lapply so it produces a list, which is easier to handle. the do.call just bind the results together (and that is why I add what is the dependent variable in your summary)

  • Related