I understand that when you generate a linear model, you can pull the residuals from the fit like this:
# Model
model <- lm(y ~ x, data = coolstuff)
# Residuals
myresids <- model$residuals
I understand further that you can use this model to predict values based on a second data set (e.g., a validation data set) like this:
mypreds <- predict(model, newdata = coolvalid)
Where I'm lost is where I can find the residuals from the prediction. predict
doesn't generate a data frame or a tibble - just a named list of numbers.
Where can I find the residuals from the predictions?
CodePudding user response:
Exactly as DanY points out in the comments, the residual is simply the observed value -
the predicted value. See below for simple example with built in data.
# sample data
set.seed(1)
split_indicies <- sample(nrow(mtcars), nrow(mtcars)/2)
train <- mtcars[split_indicies,]
test <- mtcars[-split_indicies,]
# model
model <- lm(mpg ~ disp, data = train)
# residuals of prediction are actual - predicted
test$mpg - predict(model, data = test)
#> Pontiac Firebird Hornet 4 Drive Duster 360 Mazda RX4
#> 9.8318659 -0.1503095 3.4749151 1.4901610
#> Mazda RX4 Wag AMC Javelin Merc 280C Merc 450SLC
#> -0.1098390 -0.5448161 -5.2950183 -7.7129664
#> Fiat 128 Honda Civic Ford Pantera L Toyota Corona
#> -15.8775915 -11.7018628 0.5021011 -11.2626474
#> Merc 280 Volvo 142E Toyota Corolla Ferrari Dino
#> 4.7049817 1.4746340 3.8075878 -8.5311955
Created on 2022-02-07 by the reprex package (v2.0.1)