I want to perform LASSO for cox ph model in R for variable selection. Somewhere, I found this code and done my analysis, somewhere else I found it is for elastic net, someone please confirm I am using the right code.
lasso<- cv.glmnet(xmat, ysurv, alpha = 1, family = 'cox', nfolds = 30)
CodePudding user response:
The help page for cv.glmnet()
(type ?cv.glmnet
in R or go through the help system in RStudio) isn't useful because the alpha
parameter is passed through to glmnet()
.
alpha: The elasticnet mixing parameter, with 0<=alpha<= 1. The penalty is defined as
(1-alpha)/2||beta||_2^2 alpha||beta||_1.
‘alpha=1’ is the lasso penalty, and ‘alpha=0’ the ridge penalty.
So alpha=1
is lasso (as described there), and alpha=0.95
is a mixture that is mostly lasso (L1) with a little bit of ridge (L2) mixed in.
I doubt there's much of a difference between 10-fold and 30-fold cross-validation: the reasons you might want to choose different numbers of folds are (1) computational efficiency (computation goes up with number of folds unless there is some trick for computing CV score without refitting the model, as is often the case for LOOCV) and (2) bias-variance tradeoff; see section 5.1.4 of Introduction to Statistical Learning with R.
Follow-up questions that are more statistical or data-sciencey than computational should probably go to CrossValidated.