Application and Deployment of K-Fold Cross-Validation-CodePudding

K-Fold Cross Validation is a technique applied for splitting up the data into K number of Folds for testing and training. The goal is to estimate the generalizability of a machine learning model. The model is trained K times, once on each train fold and then tested on the corresponding test fold.

Suppose I want to compare a Decision Tree and a Logistic Regression model on some arbitrary dataset with 10 Folds. Suppose after training each model on each of the 10 folds and obtaining the corresponding test accuracies, Logistic Regression has a higher mean accuracy across the test folds, indicating that it is the better model for the dataset.

Now, for application and deployment. Do I retrain the Logistic Regression model on all the data, or do I create an ensemble from the 10 Logistic Regression models that were trained on the K-Folds?

CodePudding user response：

The main goal of CV is to validate that we did not get the numbers by chance. So, I believe you can just use a single model for deployment.

If you are already satisfied with hyper-parameters and model performance one option is to train on all data that you have and deploy that model.

And, the other option is obvious that you can deploy one of the CV models.

About the ensemble option, I believe it should not give significant better results than a model trained on all data; as each model train for same amount of time with similar paparameters and they have similar architecture; but train data is slightly different. So, they shouldn't show different performance. In my experience, ensemble helps when the output of models are different due to architecture or input data (like different image sizes).

CodePudding user response：

The models trained during k-fold CV should never be reused. CV is only used for reliably estimating the performance of a model.

As a consequence, the standard approach is to re-train the final model on the full training data after CV.

Note that evaluating different models is akin to hyper-parameter tuning, so in theory the performance of the selected best model should be reevaluated on a fresh test set. But with only two models tested I don't think this is important in your case.

You can find more details about k-fold cross-validation here and there.