I am using the following code:
tc <- trainControl(method = "cv", number = 20)
lm1_cv <- train(y~., data = data, method = "lm", preProcess = c("center", "scale"),
trControl = tc)
lm1_cv
Which has the following output:
Linear Regression
1338 samples
6 predictor
Pre-processing: centered (8), scaled (8)
Resampling: Cross-Validated (20 fold)
Summary of sample sizes: 1272, 1272, 1270, 1271, 1270, 1272, ...
Resampling results:
RMSE Rsquared MAE
6048.516 0.7443666 4203.653
I have two questions:
1.) Caret is performing 20-fold cross validation. Is the average of all the testing data results stored in lm1_cv$results
?
2.) If so, how do I access the average results (RMSE, etc) of all the training data?
Overall: My goal is to compare the performance of the model on training data vs testing data. But I'm not sure how to access both.
CodePudding user response:
Yes, the average results of all the testing data are stored in the lm1_cv$results
object. The train()
function in the caret package automatically performs cross-validation and stores the results in the results element of the returned object. You can access the average results by accessing the results element of the lm1_cv
object, and then selecting the appropriate metric (e.g., RMSE, R-squared, etc.). For example:
# Access the results of the cross-validation
results <- lm1_cv$results
# Calculate the average RMSE of the testing data
mean(results$RMSE)
In this example, the results object is created by accessing the results element of the lm1_cv
object. The mean()
function is then used to calculate the average RMSE of the testing data.
To access the average results (RMSE, etc.) of all the training data, you can use the resamples()
function from the caret package. This function allows you to extract the training and testing data results from a cross-validation model. Here is an example of how you could do this:
# Extract the training and testing data results
results2 <- resamples(lm1_cv)
# Calculate the average RMSE for the training data
mean(results2$RMSE[, "Training"])
# Calculate the average RMSE for the testing data
mean(results2$RMSE[, "Testing"])
In this example, the resamples()
function is used to extract the training and testing data results from the lm1_cv
object. The mean() function is then used to calculate the average RMSE for the training and testing data. The results2$RMSE[, "Training"]
and results2$RMSE[, "Testing"]
arguments select the RMSE values for the training and testing data, respectively. You can use this approach to calculate the average performance of the model on the training and testing data, and compare the results.