I'm analysing a data set with the weight of newborn babies and some info about their mothers, including a categorical variable 'smoke' - whether a mother is a smoker, or not.
I did an aov test and wanted to plot diagnostic plots of an ANOVA model with its help. I expected to get four plots, including a 'Residuals vs Factor Levels' plot. Instead, I got a 'Residuals vs Leverage' plot, as if my categorical variable was a numeric one.
You can find the dataset here:
I tried to check whether 'smoke' is a factor as I wanted or not, like that:
> head(babies$smoke)
[1] non smoker non smoker smoker non smoker smoker non smoker
Levels: non smoker smoker
Since 'smoke' is a factor (as I understand) and a categorical variable, why is there leverage as per numeric variable? How to fix this and get the proper plot?
Thanks for the help in advance!
CodePudding user response:
There's no plot.aov
method, so you got the plot.lm
method (since the class of aov(babies$bwt ~ babies$smoke)
would be c("aov", "lm")
).
If you want to plot something else for "aov"
objects, write a function plot.aov
to do it. For example,
plot.aov <- function(x, which = 1:3, ...) {
NextMethod(x, which = which, ...)
for (f in names(x$xlevels)) {
xvar <- x$model[, f]
plot(xvar, residuals(x), xlab = f, ylab = "Residuals")
}
}
Then
plot(aov(babies$bwt ~ babies$smoke))
will plot the first 3 plot.lm
plots, followed by residual plots against the factor in your model. (It will do each of the factors if there are others.) I get
Alternatively, you can use the termplot
function, e.g.
termplot(aov(babies$bwt ~ babies$smoke), partial.resid=TRUE)
which gives this plot: