Here is an example of my dataset:
df <- data.frame(
id = c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65),
collection_point = c(rep(c("Baseline", "Immediate", "3M"), each=28)),
intervention = c(rep(c("B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C",
"A", "A", "B", "A", "C", "B", "C", "A", "A"), each = 4)),
scale_A = c(6.5, 7.0, 6.25, 6.0, NA, 7.5, 7.5,
8.0, 7.5, 6.75, 7.5, 6.75, 6.75, 6.5,
5.75, 6.75, 7.75, 7.5, 7.75, 7.25, 7.75,
7.25, 7.25, 5.75, 6.75, NA, 6.75, 7.5,
6.75, 7.0, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.25, 7.25, 7.25, 7.5, 6.5, 6.25,
6.25, 7.25, 7.5, 6.75, 7.25, 7.25, 7.5,
7.25, 7.5, 7.25, NA, 7.0, 7.5, 7.5,
6.75, 7.25, 6.5, 7.0, 7.5, 7.5, 7.5,
7.75, 7.5, 7.5, 7.5, 7.5, 6.5, 5.75,
6.25, 6.75, 7.5, 7.25, 7.25, 7.5, 7.75,
7.75, 7.75, 7.5, NA, NA, NA, NA))
scale_B = c(5.0, 6.5, 6.25, 7.0, NA, 5.5, 6.5,
6.0, 7.5, 5.75, 6.5, 5.75, 7.75, 6.5,
6.75, 7.75, 7.75, 7.5, 7.75, 5.25, 7.75,
6.25, 6.25, 6.75, 5.75, NA, 6.75, 6.5,
7.75, 6.0, 7.5, 6.0, 7.5, 7.5, 6.5,
6.75, 6.25, 6.25, 6.25, 6.5, 6.5, 7.25,
7.25, 6.25, 6.5, 7.75, 6.25, 7.25, 6.5,
6.25, 6.5, 6.25, NA, 7.0, 6.5, 7.5,
7.75, 6.25, 7.5, 6.0, 7.5, 6.5, 6.5,
6.75, 6.5, 6.5, 6.5, 7.5, 7.5, 6.75,
7.25, 7.75, 6.5, 6.25, 7.25, 6.5, 6.75,
6.75, 6.75, 6.5, 5.5, NA, NA, 6.5))
scale_C = c(5.5, 5.0, 7.25, 7.0, 8.0, 5.5, 5.5,
8.0, 5.5, 7.75, 5.5, 7.75, 7.75, 7.5,
7.75, 7.75, 5.75, 5.5, 5.75, 5.25, 5.75,
5.25, 6.25, 7.75, 7.75, NA, 7.75, 5.5,
6.75, 6.0, 7.5, 5.0, 5.5, 5.5, 7.5,
5.75, 6.25, 5.25, 5.25, 5.5, 7.5, 7.25,
7.25, 6.25, 5.5, 7.75, 5.25, 5.25, 7.5,
5.25, 6.5, 5.25, 5.0, 5.0, 5.5, 5.5,
7.75, 6.25, 7.5, 5.0, 5.5, 5.5, 7.5,
5.75, 6.5, 5.5, 5.5, 5.5, 7.5, 7.75,
7.25, 7.75, 5.5, 5.25, 5.25, 5.5, 6.75,
5.75, 5.75, 5.5, 6.75, NA, 5.75, NA))
where,
id
= participant
collection_point
= times data was collected from participant (repeated measure)
intervention
= group each participant was randomized to (fixed effect)
scale_A
= questionnaire score that each participant completed at each data collection point (outcome)
Participants were randomized to one of three interventions and completed the same scales (scales A-C) at three different time points to determine any improvements over time.
I have used the code
mixed.lmer.A1<-lmer(scale_A~intervention (collection_point|id), control =
lmerControl(check.nobs.vs.nRE = "ignore"), data = df)
but I would like to run MANOVA as all scales measure different aspects of a cohesive theme. However, I can't run
mixed.lmer.comb<-lmer(cbind(scale_A, scale_B, scale_C)~intervention
(collection_point|id), control = lmerControl(check.nobs.vs.nRE = "ignore"),
data = df)
like I originally thought. It does work if I run using lm
but that wouldn't be super meaningful as I need to account for collection_point
as a repeated measure.
Is there a way I can run multiple dependent variables using lmer
?
CodePudding user response:
You can do this by converting the data to long format: there are a lot of ways to do this, e.g. reshape
in base R or reshape2::melt
, but I find tidyr::pivot_longer
the easiest:
df_long <- tidyr::pivot_longer(df, cols = starts_with("scale_"),
names_to = "scales",
values_to = "value")
The fixed effects are 0 scales scales:intervention
: we don't want an overall intercept, we want a scale-specific intercept, plus intervention effects for each scale.
The random effects are collection_point|scales/id
: this allows the effect of collection point to vary across scales and across id
(as in the original model).
mm <- lmer(value ~ 0 scales scales:intervention (collection_point|scales/id),
data = df_long,
control = lmerControl(check.nobs.vs.nRE = "ignore"))
This model is technically correct, but gives a singular fit (as is not surprising since we are trying to estimate a variance across only three levels of scales); see ?isSingular
, or the GLMM FAQ, for advice about to handle this.
This is not the only model we could set up; a maximal model would include more terms.
Some further comments:
- One principle is that, since the elements of multivariate data generally have different units, we should not have any terms in the model that apply across scales (such as an overall intercept, or an overall effect of intervention); this might not apply in your particular case, I don't know
- it is unusual (and often, although not always, wrong) to have a term varying in a random effect (
collection_point
in this case) that does not have a corresponding fixed effect — by doing so you are assuming that the average (population-level) effect of collection point is exactly zero, which is surprising unless (1) there's something special about the experimental design (e.g. observations were somehow randomized across collection points), (2) you have pre-processed the data in some way (e.g. explicitly transformed the data to have zero variance across collection points), (3) are setting up a null model for comparison. - I'm a little concerned about your need to override the check that you have fewer random effects estimated than observations; I haven't looked into this in detail, but that usually means your model is overfitting in some way. (Maybe it's because we're looking at a subset of the data, and this doesn't come up in the full data set?)
More here.