Using the Weekly Dataset from ISLR pacakge on R:
> head(Weekly)
Year Lag1 Lag2 Lag3 Lag4 Lag5 Volume Today Direction
1 1990 0.816 1.572 -3.936 -0.229 -3.484 0.1549760 -0.270 Down
2 1990 -0.270 0.816 1.572 -3.936 -0.229 0.1485740 -2.576 Down
3 1990 -2.576 -0.270 0.816 1.572 -3.936 0.1598375 3.514 Up
4 1990 3.514 -2.576 -0.270 0.816 1.572 0.1616300 0.712 Up
5 1990 0.712 3.514 -2.576 -0.270 0.816 0.1537280 1.178 Up
6 1990 1.178 0.712 3.514 -2.576 -0.270 0.1544440 -1.372 Down
Trying to use Logistic Regression to regress Direction on *all Lag variables and Volume*, and tried to use the "all except" shortcut on R to exlcude Year and Today:
logregall <- glm(Direction ~ . - Today - Year,
family=binomial(link='logit'), data = Weekly)
But when I try to use this same object to make predictions, R somehow gives the error that I have forgotten to include Year in the 'newdata' dataframe despite not including Year:
dataforpred <- Weekly[,2:7]
preds <- predict(object = logregall, newdata = dataforpred, type = "response")
> preds <- predict(object = logregall, newdata = dataforpred, type = "response")
Error in eval(predvars, data, env) : object 'Year' not found
But when I regress by keying all variables manually, I get a fitted object that works for predict()
logregall2 <- glm(Direction ~ Lag1 Lag2 Lag3 Lag4 Lag5 Volume,
family=binomial(link='logit'), data = Weekly)
preds <- predict(object = logregall2, newdata = dataforpred, type = "response")
> head(preds)
1 2 3 4 5 6
0.6086249 0.6010314 0.5875699 0.4816416 0.6169013 0.5684190
Why is this the case?
CodePudding user response:
I don't have the package but I can replicate the error with mtcars dataset. I believe the reason is because you specified to remove some columns with -
, so what the function does is to remove those columns first and then performs the prediction. It gets error out since it could not find those columns in the newdata
.
Therefore, the solution is to manually assign arbitrary values to the columns.
fit <- glm(vs~. -mpg-cyl,data=mtcars,
family=binomial(link='logit'))
dataforpred <- mtcars[,c(3:7,9:11)]
preds <- predict(object = fit, newdata = dataforpred, type = "response")
Error in eval(predvars, data, env) : object 'cyl' not found
#solution
dataforpred2 <- dataforpred%>%
mutate(mpg=NA_real_,
cyl=NA_real_)
preds2 <- predict(object = fit, newdata = dataforpred2, type = "response")
> preds2[1:5]
1 2 3 4 5
2.220446e-16 1.081386e-11 1.000000e 00 1.000000e 00 2.220446e-16