Looping/sapply through nlme function-CodePudding

I am trying to execute a loop with mixed-model effects with response variable changing. I came from here and here. I should say that I have tried sthg creating a function and then sapply or lapply (wihtout success)

I provide a small dataset (really small) just to represent my original database (much larger and similar to those of longitudinal studies)

data<- structure(list(paciente = structure(c(6527, 6302, 6454, 6302, 
6238), label = "Paciente", format.spss = "F6.0"), edad_s1 = structure(c(63, 
60, 64, 60, 66), label = "Edad", format.spss = "F3.0"), sexo_s1 = structure(c(1L, 
1L, 1L, 1L, 2L), .Label = c("Hombre", "Mujer"), label = "Sexo", class = "factor"), 
    peso1 = c(90.5, 110.2, 92, 107.7, 85), time = structure(c(1L, 
    3L, 3L, 1L, 3L), .Label = c("0", "1", "2"), class = "factor"), 
    cintura1 = c(114, 121, 113, 121, 118), tasis2_e = c(121, 
    142, 123, 138, 146), tadias2_e = c(56, 79, 76, 85, 81), p17_total = c(4, 
    10, 11, 7, 10), geaf_tot = c(559.44, 923.08, 1272.73, 1384.62, 
    1258.74), glucosa = c(90, 149, 90, 126, 185), albumi = c(4.48, 
    4.65, 4.59, 4.65, 4.72), coltot = c(250, 241, 280, 211, 207
    ), hdl = c(49, 55, 53, 45, 56), ldl_calc = c(178, 145, NA, 
    137, 114), trigli = c(117, 203, 414, 143, 183), hba1c = c(5.9, 
    6.38, 5.24, 5.86, 8.02), i_hucpeptide = c(988.17, 1046.24, 
    1250.17, 1021.56, 1548.18), i_hughrelin = c(1292.73, 375.44, 
    727.79, 406.6, 859.89), i_hugip = c(2.67, 2.67, 2.67, 2.67, 
    2.67), i_huglp1 = c(60.29, 374.77, 213.82, 258.84, 192.61
    ), i_huglucagon = c(379.54, 781.85, 503.48, 642.2, 554.8), 
    i_huinsulin = c(214.73, 532.7, 391.98, 518.59, 650.66), i_huleptin = c(6049.44, 
    6423.4, 13185.8, 7678.36, 8906.6), i_hupai1 = c(1925.44, 
    3374.85, 2507.28, 2723.42, 2844.02), i_huresistin = c(4502.68, 
    4481.43, 5895.57, 4570.61, 2417.61), i_huvisfatin = c(816.18, 
    1266.7, 2321.13, 1276.66, 1196.37), col_rema = c(23, 41, 
    NA, 29, 37), homa = c(7.953, 32.663, 14.518, 26.89, 49.536
    ), i_pcr = c(NA, 0.14, 0.21, 0.67, 0.17)), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

Afterwards I am defining my iteration and my variables database

ex<- subset(data[, 6:30])

for (i in 1:length(ex)) {
  var_1 <- ex[,i]
  var_1 <- unlist(var_1)
  lme_1 <- lme(var_1 ~ sexo_s1*peso1   edad_s1   p17_total   poly(time, 2)*grupo_int_v00,
             random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
           data = dat_longer, subset = !is.na(var_1))

Error in model.frame.default(formula = ~time   var_1   sexo_s1   peso1    : 
  invalid type (list) for variable 'var_1'

I have tried unlisting/as.data.frame in before running the loop

for (i in 1:length(data)) {
  var_1 <- data[,i]
  var_1 <- unlist(var_1) #or as.data.frame(var_1)
    lme_1 <- lme(var_1 ~ sexo_s1*peso1   edad_s1   p17_total   poly(time, 2)*grupo_int_v00,
             random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
           data = dat_longer, subset = !is.na(var_1))
}
Error in model.frame.default(formula = ~time   var_1   sexo_s1   peso1    : 
  variable lengths differ (found for 'var_1')

I have also tried to develop a new function to iterate over

lme_z <- function(z){
out <-  lme(z ~ sexo_s1*peso1   edad_s1   p17_total   poly(time, 2)*grupo_int_v00,
             random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
           data = dat_longer, subset = !is.na(z))
  
}
Error

If there is some contribution to iterate in the response variable (I know Ben Bolker is an expert) Thanks in advance

CodePudding user response：

If data is a data frame containing all of the variables that you use in your formula, including all of the responses that you want to consider, then you can do:

f <- function(resp) {
    fixed <- . ~ sexo_s1 * peso1   edad_s1   p17_total   poly(time, 2) * grupo_int_v00
    fixed[[2L]] <- as.name(resp)
    lme(fixed = fixed,
        random = ~poly(time, 2) | paciente, 
        data = data,
        subset = !is.na(data[[resp]]),
        control = lmeControl(opt = "optim"))
}

list_of_lme_objects <- lapply(names_of_response_variables, f)

An important piece is:

fixed <- . ~ sexo_s1 * peso1   edad_s1   p17_total   poly(time, 2) * grupo_int_v00
fixed[[2L]] <- as.name(resp)

The second statement injects the response named resp into the left hand side of the formula template. A more transparent example:

fixed <- . ~ world
fixed[[2L]] <- as.name("hello")
fixed
## hello ~ world

Another important piece is:

subset = !is.na(data[[resp]])

Here, the right hand side actually evaluates to a logical vector of length equal to the number of rows of data. You might consider passing na.action = na.omit instead of subset, though that will also omit rows where the independent variables have missing values, so the semantics are slightly different.

The variable grupo_int_v00 is missing from your data frame. You'll have to fix that on your end in order to test the code...

CodePudding user response：

I was going to suggest:

formvars <- c("sexo_s1*peso1",
              "edad_s1",
              "p17_total",
              "poly(time, 2)")
## excluded *grupo_int_v00 since not in example data frame
respvars <- names(df)[6:30]
for (r in respvars) {
  lme_1 <- lme(reformulate(formvars, response = r),
               random = ~ poly(time, 2)|paciente,
               control=lmeControl(opt="optim"),
               data = df,
               na.action = na.exclude)
}

Many of @MikaelJagan's points are well taken. In particular:

grupo_int_v00 excluded since it wasn't in your example data set
this code doesn't work for your example since there are only two complete cases (i.e., observations with no missing predictors/responses) in the data set, so we can't fit a quadratic polynomial ("degree must be less than the number of unique points")
I used na.exclude, which obviates your subset argument; it excludes NA values when fitting but will re-introduce them e.g. in calculating predictions or residuals