I'm analysing the Boston
dataset from the
plot(predict(lm.Boston), residuals(lm.Boston))
Looking at the points on each plot and the values of the axes, there clearly seems to be some significant difference between these two "residual" plots. However, I'm struggling to understand precisely what the difference is here (besides the obvious fact that one uses the generic plot
function and the other uses ggplot
). Aren't these both plots of the residuals? If so, then what is the difference? I'm assuming the confusion here stems from the complexity in the ggplot
plot, since the plot
plot is quite straightforwardly a plot of the residuals vs predicted values.
CodePudding user response:
To answer your question -> no!
- The first plot with ggplot shows the data points from which you could "measure" or "inspect" the residuals!
- To plot the residuals with ggplot try this
library(ggfortify)
autoplot(lm.Boston)
- You should get 4 graphs the first one is relevant for you.
- This is similar to what you do with the base R plot -> Here you really plot the residuals.
Here is an example with the mtcars dataset:
lm.mtcars <- lm(formula = cyl ~ mpg, data =mtcars)
mtcars$residuals <- lm.mtcars$residuals
mtcars$predicted <- predict(lm.mtcars)
# plot1 your first ggplot
ggplot(mtcars, aes(x = cyl, y = mpg))
geom_smooth(method = "lm", se = FALSE, color = "lightgrey") # Plot regression slope
geom_segment(aes(xend = cyl, yend = predicted), alpha = 0.2) # alpha to fade lines
geom_point()
geom_point(aes(y = predicted), shape = 1)
theme_bw() # Add theme for cleaner look
# plot2 your R base plot
par(mfrow = c(1, 1))
plot(mtcars$predicted, mtcars$residuals)
# plot3
library(ggfortify)
autoplot(lm.mtcars, which = 1, ncol = 1)