Home > front end >  Why R plots Residuals vs Leverage instead of Residuals vs Factor Levels (ANOVA test & model with �
Why R plots Residuals vs Leverage instead of Residuals vs Factor Levels (ANOVA test & model with �

Time:01-09

I'm analysing a data set with the weight of newborn babies and some info about their mothers, including a categorical variable 'smoke' - whether a mother is a smoker, or not.

I did an aov test and wanted to plot diagnostic plots of an ANOVA model with its help. I expected to get four plots, including a 'Residuals vs Factor Levels' plot. Instead, I got a 'Residuals vs Leverage' plot, as if my categorical variable was a numeric one.

You can find the dataset here: enter image description here

I tried to check whether 'smoke' is a factor as I wanted or not, like that:

> head(babies$smoke)
[1] non smoker non smoker smoker     non smoker smoker     non smoker
Levels: non smoker smoker

Since 'smoke' is a factor (as I understand) and a categorical variable, why is there leverage as per numeric variable? How to fix this and get the proper plot?

Thanks for the help in advance!

CodePudding user response:

There's no plot.aov method, so you got the plot.lm method (since the class of aov(babies$bwt ~ babies$smoke) would be c("aov", "lm")).

If you want to plot something else for "aov" objects, write a function plot.aov to do it. For example,

plot.aov <- function(x, which = 1:3, ...) {
  NextMethod(x, which = which, ...)
  for (f in names(x$xlevels)) {
    xvar <- x$model[, f]
    plot(xvar, residuals(x), xlab = f, ylab = "Residuals")
  }
}

Then

plot(aov(babies$bwt ~ babies$smoke))

will plot the first 3 plot.lm plots, followed by residual plots against the factor in your model. (It will do each of the factors if there are others.) I get

screenshot

Alternatively, you can use the termplot function, e.g.

termplot(aov(babies$bwt ~ babies$smoke), partial.resid=TRUE)

which gives this plot:

enter image description here

  •  Tags:  
  • Related