Home > Enterprise >  Loop through mixed binomial logistic regression model using formulas in a column of the dataset
Loop through mixed binomial logistic regression model using formulas in a column of the dataset

Time:11-27

My goal is ultimately to be able to create a loop that goes through multiple mixed binomial logistic regression models using different formulas specified in either a separate dataframe or a column in the dataframe for the model.

df2 has the formulas in each row. I just included the same 10 formulas for simplicity but I would end up running 10 different formulas.

I merged df1 (data for model) and df2 (formulas for model) into df3 in case everything needs to be in 1 dataset.

I am fitting the models using the gamlj package function gamljGlmMixed.

If its possible to loop by formula in column or df2 then I will create code to extract AIC and BIC from each model to ultimately pick the best option.

At the end I included the code for fitting a model without a loop.

#install gamlj from github if needed
devtools::install_github("gamlj/gamlj")
#load packages 
library(tidyverse)
library(gamlj)

#add example data 
df1 <- read.csv("https://stats.idre.ucla.edu/stat/data/hdp.csv")
df1

#data with formulas for loop 
df2 <- structure(list(Column1 = c("remission ~ 1   IL6   CRP  (1   IL6   CRP | DID)", 
                                  "remission ~ 1   IL6   CRP  (1   IL6   CRP | DID)", "remission ~ 1   IL6   CRP  (1   IL6   CRP | DID)", 
                                  "remission ~ 1   IL6   CRP  (1   IL6   CRP | DID)", "remission ~ 1   IL6   CRP  (1   IL6   CRP | DID)", 
                                  "remission ~ 1   IL6   CRP  (1   IL6   CRP | DID)", "remission ~ 1   IL6   CRP  (1   IL6   CRP | DID)", 
                                  "remission ~ 1   IL6   CRP  (1   IL6   CRP | DID)", "remission ~ 1   IL6   CRP  (1   IL6   CRP | DID)", 
                                  "remission ~ 1   IL6   CRP  (1   IL6   CRP | DID)")), row.names = c(NA, 
                                                                                                     -10L), class = "data.frame")
#Merge data sets by row # 
#Now loop again to add distance in 
df1 <- df1 %>% mutate(Row_ = row_number())
df2 <- df2 %>% mutate(Row_ = row_number())

df3 <- merge(df1, df2, by="Row_", all = T)
df3

gamlj::gamljGlmMixed(
  formula = remission ~ 1   IL6   CRP  (1   IL6   CRP | DID),
  data = df3,
  showParamsCI = TRUE,
  showExpbCI = FALSE,
  modelSelection = "custom",
  custom_family = "binomial",
  custom_link = "logit")

CodePudding user response:

Using your df1 and formulas from df2 in a character vector, you could try looping through the formulas with as.formula and storing all of the results in a list (like below). While I use the standard glm for generalizability, it should be the same for gamlj::gamljGlmMixed. No need for df3, so the analytic data would be df1.

# just need a character vector for the model formulas                                                                                          
fmlas <- unlist(df2)

model_list <- list()
for (xx in fmlas){
  model_list[[xx]] <- glm(as.formula(xx), data = df1)
}

Or for your model:

for (xx in fmlas){
  model_list[[xx]] <- gamlj::gamljGlmMixed(
    formula = as.formula(xx),
    data = df1,
    showParamsCI = TRUE,
    showExpbCI = FALSE,
    modelSelection = "custom",
    custom_family = "binomial",
    custom_link = "logit"
  )
}

[Note that in the sample data, the loop will continue to overwrite the list position since all the formulas are the same in the sample data.]

  • Related