I'm creating a lasso model and keep getting errors. Please help me solve:
library(plotmo)
x=model.matrix(Salary~.-1,data=Hitters)
y=Hitters$Salary
cv.lasso=cv.glmnet(x,y)
plot(cv.lasso, label=5)
coef(cv.lasso)
lasso.tr=glmnet(x[train,],y[train])
pred=predict(lasso.tr,x[-train,])
dim(pred)
rmse= sqrt(apply((y[-train]-pred)^2,2,mean))
plot(log(lasso.tr$lambda),rmse,type="b",xlab="Log(lambda)")
lam.best=lasso.tr$lambda[order(rmse)[1]]
lam.best
coef(lasso.tr,s=lam.best)
Error is:
Error in glmnet(x, y, weights = weights, offset = offset, lambda = lambda, : number of observations in y (322) not equal to the number of rows of x (263)
CodePudding user response:
This error is because you have more y
values than x
values. If you look at the data, you can see that there are missing values for Salary
. You should get rid of the observations that have missing Salary
values:
library(plotmo)
Hitters = Hitters[!is.na(Hitters$Salary), ]
x=model.matrix(Salary~.-1,data=Hitters)
y=Hitters$Salary
cv.lasso=cv.glmnet(x,y)
plot(cv.lasso, label=5)
coef(cv.lasso)
lasso.tr=glmnet(x[train,],y[train])
pred=predict(lasso.tr,x[-train,])
dim(pred)
rmse= sqrt(apply((y[-train]-pred)^2,2,mean))
plot(log(lasso.tr$lambda),rmse,type="b",xlab="Log(lambda)")
CodePudding user response:
First off, please always make sure your post is self-contained by explicitly stating any non-base R packages. This is particularly important when you refer to an external package for sample data (ISLR::Hitters
).
The issue you're seeing has to do with observations containing missing values in ISLR::Hitters
and how they are treated (i.e., omitted!) within model.matrix
(see e.g.