Train/Test Set Proportion in cv.glmnet from glmnet package in R-CodePudding

I was just wondering what is the percentage of train and test set in cv.glmnet from glmnet package in R. I have already read the glmnet package documentation and no information was included regarding the train/test set proportion. Please tell me if I missed something from the package documentation. Any help would be greatly appreciated. Thank you.

CodePudding user response：

from the help page for ?cv.glmnet there are two parts to look at:

Argument nfolds

number of folds - default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is nfolds=3

And from the Values section for foldid

if keep=TRUE, the fold assignments used

ie. set keep=TRUE in the function argument to access the folds afterwards

The function will put each row in to 10 roughly equally sized groups/folds. Then it will run 10 iterations of the model, leaving one of these out each time for testing. So its 90% train and 10% test but repeated 10 times.

You can supply your own folds with the foldid argument if you prefer. Hope that helps :)

CodePudding user response：

Train set is the data set on which the model is trained
Test set is the data set on which the model performance is evaluated

The model is trained using Train Set and then its performance is evaluated with Test Set,

where once Test Set satisfies some condition, it can be used as TL-test set for further evaluation or as Seed for another Train/Test Split.

etc.

Detection of feature importances are another additional reason to have a test-set,

while classification models are classified as models where predictive accuracy has only one single correct answer.

New neural network models are usually evaluated with evaluation sets of 8n%,25%,50%of their training sets respectively, i.e. If the training set is of size N, then the test set can be made of size N, N/2, N/4 and N/8.