I am trying to execute a loop with mixed-model effects with response variable changing. I came from here and here. I should say that I have tried sthg creating a function and then sapply or lapply (wihtout success)
I provide a small dataset (really small) just to represent my original database (much larger and similar to those of longitudinal studies)
data<- structure(list(paciente = structure(c(6527, 6302, 6454, 6302,
6238), label = "Paciente", format.spss = "F6.0"), edad_s1 = structure(c(63,
60, 64, 60, 66), label = "Edad", format.spss = "F3.0"), sexo_s1 = structure(c(1L,
1L, 1L, 1L, 2L), .Label = c("Hombre", "Mujer"), label = "Sexo", class = "factor"),
peso1 = c(90.5, 110.2, 92, 107.7, 85), time = structure(c(1L,
3L, 3L, 1L, 3L), .Label = c("0", "1", "2"), class = "factor"),
cintura1 = c(114, 121, 113, 121, 118), tasis2_e = c(121,
142, 123, 138, 146), tadias2_e = c(56, 79, 76, 85, 81), p17_total = c(4,
10, 11, 7, 10), geaf_tot = c(559.44, 923.08, 1272.73, 1384.62,
1258.74), glucosa = c(90, 149, 90, 126, 185), albumi = c(4.48,
4.65, 4.59, 4.65, 4.72), coltot = c(250, 241, 280, 211, 207
), hdl = c(49, 55, 53, 45, 56), ldl_calc = c(178, 145, NA,
137, 114), trigli = c(117, 203, 414, 143, 183), hba1c = c(5.9,
6.38, 5.24, 5.86, 8.02), i_hucpeptide = c(988.17, 1046.24,
1250.17, 1021.56, 1548.18), i_hughrelin = c(1292.73, 375.44,
727.79, 406.6, 859.89), i_hugip = c(2.67, 2.67, 2.67, 2.67,
2.67), i_huglp1 = c(60.29, 374.77, 213.82, 258.84, 192.61
), i_huglucagon = c(379.54, 781.85, 503.48, 642.2, 554.8),
i_huinsulin = c(214.73, 532.7, 391.98, 518.59, 650.66), i_huleptin = c(6049.44,
6423.4, 13185.8, 7678.36, 8906.6), i_hupai1 = c(1925.44,
3374.85, 2507.28, 2723.42, 2844.02), i_huresistin = c(4502.68,
4481.43, 5895.57, 4570.61, 2417.61), i_huvisfatin = c(816.18,
1266.7, 2321.13, 1276.66, 1196.37), col_rema = c(23, 41,
NA, 29, 37), homa = c(7.953, 32.663, 14.518, 26.89, 49.536
), i_pcr = c(NA, 0.14, 0.21, 0.67, 0.17)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
Afterwards I am defining my iteration and my variables database
ex<- subset(data[, 6:30])
for (i in 1:length(ex)) {
var_1 <- ex[,i]
var_1 <- unlist(var_1)
lme_1 <- lme(var_1 ~ sexo_s1*peso1 edad_s1 p17_total poly(time, 2)*grupo_int_v00,
random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
data = dat_longer, subset = !is.na(var_1))
Error in model.frame.default(formula = ~time var_1 sexo_s1 peso1 :
invalid type (list) for variable 'var_1'
I have tried unlisting/as.data.frame in before running the loop
for (i in 1:length(data)) {
var_1 <- data[,i]
var_1 <- unlist(var_1) #or as.data.frame(var_1)
lme_1 <- lme(var_1 ~ sexo_s1*peso1 edad_s1 p17_total poly(time, 2)*grupo_int_v00,
random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
data = dat_longer, subset = !is.na(var_1))
}
Error in model.frame.default(formula = ~time var_1 sexo_s1 peso1 :
variable lengths differ (found for 'var_1')
I have also tried to develop a new function to iterate over
lme_z <- function(z){
out <- lme(z ~ sexo_s1*peso1 edad_s1 p17_total poly(time, 2)*grupo_int_v00,
random = ~ poly(time, 2)|paciente, control=lmeControl(opt="optim"),
data = dat_longer, subset = !is.na(z))
}
Error
If there is some contribution to iterate in the response variable (I know Ben Bolker is an expert) Thanks in advance
CodePudding user response:
If data
is a data frame containing all of the variables that you use in your formula, including all of the responses that you want to consider, then you can do:
f <- function(resp) {
fixed <- . ~ sexo_s1 * peso1 edad_s1 p17_total poly(time, 2) * grupo_int_v00
fixed[[2L]] <- as.name(resp)
lme(fixed = fixed,
random = ~poly(time, 2) | paciente,
data = data,
subset = !is.na(data[[resp]]),
control = lmeControl(opt = "optim"))
}
list_of_lme_objects <- lapply(names_of_response_variables, f)
An important piece is:
fixed <- . ~ sexo_s1 * peso1 edad_s1 p17_total poly(time, 2) * grupo_int_v00
fixed[[2L]] <- as.name(resp)
The second statement injects the response named resp
into the left hand side of the formula template. A more transparent example:
fixed <- . ~ world
fixed[[2L]] <- as.name("hello")
fixed
## hello ~ world
Another important piece is:
subset = !is.na(data[[resp]])
Here, the right hand side actually evaluates to a logical vector of length equal to the number of rows of data
. You might consider passing na.action = na.omit
instead of subset
, though that will also omit rows where the independent variables have missing values, so the semantics are slightly different.
The variable grupo_int_v00
is missing from your data frame. You'll have to fix that on your end in order to test the code...
CodePudding user response:
I was going to suggest:
formvars <- c("sexo_s1*peso1",
"edad_s1",
"p17_total",
"poly(time, 2)")
## excluded *grupo_int_v00 since not in example data frame
respvars <- names(df)[6:30]
for (r in respvars) {
lme_1 <- lme(reformulate(formvars, response = r),
random = ~ poly(time, 2)|paciente,
control=lmeControl(opt="optim"),
data = df,
na.action = na.exclude)
}
Many of @MikaelJagan's points are well taken. In particular:
grupo_int_v00
excluded since it wasn't in your example data set- this code doesn't work for your example since there are only two complete cases (i.e., observations with no missing predictors/responses) in the data set, so we can't fit a quadratic polynomial ("degree must be less than the number of unique points")
- I used
na.exclude
, which obviates yoursubset
argument; it excludesNA
values when fitting but will re-introduce them e.g. in calculating predictions or residuals