I'm working on a 3 way ANCOVA in R. 3 categorical predictors, 1 non-negative continuous covariate, and 1 non-negative continuous response variable. I've worked through all the assumptions, omitted one extreme outlier, and I've gotten to the following model:
model<-lm(response variable ~ centered covariate predictor 1 * predictor 2 * predictor 3, data)
In the past, I have seen the model followed up with:
summary(model)
and
summary.aov(model)
I can't remember the reason to run both lines of code. Is this familiar to anyone else who knows why we need both ? I can provide the output if that is helpful.
CodePudding user response:
A reproducible example based on your description:
dat <- data.frame(y = rnorm(150), x = rnorm(150),
f1 = sample(gl(3, 50, labels = letters[1:3])),
f2 = sample(gl(3, 50, labels = letters[1:3])),
f3 = sample(gl(3, 50, labels = letters[1:3])))
model <- lm(y ~ x f1 * f2 * f3, data = dat)
## ?
summary(model)
## ?
summary.aov(model)
I can't remember the reason to run both lines of code.
You get different summary statistics from the two summary
functions.
summary(model)
is in fact summary.lm(model)
, because model
is fitted using lm
. This gives you t-statistic for individual fitted coefficient.
summary.aov(model)
is effectively doing anova(model)
, giving you an ANOVA table with F-statistic. This is most useful here because you have interactions between factor variables.
Note that you may also fit your model using aov
.
MODEL <- aov(y ~ x f1 * f2 * f3, data = dat)
Now summary(MODEL)
is in fact summary.aov(MODEL)
, because MODEL
is fitted by aov
.
## ANOVA table, as same as summary.aov(model)
summary(MODEL)
If you want to get t-statistics, you need to use
## as same as summary(model)
summary.lm(MODEL)
Confusing? Well, bear in mind that summary()
is a generic function with different methods. Obviously, the "lm" and "aov" methods produce different things.
Extensive Reading:
Reply
I was getting so worked up trying to figure out why the output was different and the significance of the predictors was different between the two functions, then I realized the same thing after taking a break! Thank you so much.
There was also little explanation in my notes about
lm
vsaov
, so I appreciate it.
lm
and aov
perform the same computations internally, but prioritize different summary output when you simply call summary()
. The explicit usage of summary.lm()
or summary.aov()
has a clear orientation.
Is there a best way to move forward now knowing that the interaction between
f2
andf3
is significant?
Usually we want to see if we can simplify our model structure. At the moment f1 * f2 * f3
includes all possible interactions, namely f1
, f2
, f3
, f1:f2
, f1:f3
, f2:f3
and f1:f2:f3
. Perhaps some of them can be dropped. But since my example data are just random values, the reported significance is pointless. If you need particular advice for your meaningful data, you can post a question on https://stats.stackexchange.com/.