R caret's train with poly of a given degree-CodePudding

I'm rolling a loop over the degree of the approximating polynomial for training with caret

ds = 1:20
for(i in 1:length(ds)){
  print(i)
  d=ds[i]
  fit = train(y~poly(x,degree=d),data=training,method="lm",trControl=fitCtrl)
  # other operations
}

running the code gives

Error in `[.data.frame`(data, 0, cols, drop = FALSE) : 
undefined columns selected

using d=4 doesn't work, but fixing the degree in the call, i.e. degree=4, works.

Any guess of what's going on here?

Thanks!

EDIT:

library(caret)
set.seed(1)
omega = 0.5*pi
xi = 0.5
phi = 0.5*pi
f = function(t)1-exp(-xi*omega*t)*sin(sqrt(1-xi^2)*omega*t phi)/sin(phi)
sigma = 0.03
train.n = 100
x = seq(0,2*pi,by=2*pi/(train.n-1))
y = f(x) rnorm(train.n,mean=0,sd=sigma)
training = data.frame(x=x,y=y)
fitCtrl <- trainControl(method = "LOOCV",verboseIter = FALSE)
ds = 1:20
for(i in 1:length(ds)){
  print(i)
  d=4
fit=train(y~poly(x,degree=4),data=training,method="lm",trControl=fitCtrl)
}

CodePudding user response：

Formulas will always look for variables such as d in the data, just as it does for y and x here.

To make R interpret the d as a number, wrap it in I().

Reproducable example using mtcars:

library(dplyr, warn.conflicts = FALSE) # For the pipe

# Only showing two iterations to illustrate that code is working
ds <- 1:2

for(i in 1:length(ds)){
  
  d <- ds[i]
  
  lm(mpg~poly(hp, degree = I(d)), data = mtcars) %>% 
    coefficients() %>% 
    print()
  
}
#>             (Intercept) poly(hp, degree = I(d)) 
#>                20.09062               -26.04559 
#>              (Intercept) poly(hp, degree = I(d))1 poly(hp, degree = I(d))2 
#>                 20.09062                -26.04559                 13.15457

^{Created on 2022-03-19 by the reprex package (v2.0.1)}

CodePudding user response：

We may use paste to create the formula here

d <- 4
train(as.formula(paste0('y ~ poly(x, degree =', d, ')')), 
      data = training, method = "lm", trControl = fitCtrl)

-output

Linear Regression 

100 samples
  1 predictor

No pre-processing
Resampling: Leave-One-Out Cross-Validation 
Summary of sample sizes: 99, 99, 99, 99, 99, 99, ... 
Resampling results:

  RMSE        Rsquared   MAE       
  0.03790195  0.9779768  0.02937452

With the loop, we may need to store the output in a list

ds <- 1:20
fitlst <- vector('list', length(ds))
for(i in seq_along(ds)){
  print(i)
  d <- ds[i]
  
  fitlst[[i]] <- train(as.formula(paste0('y ~ poly(x, degree =', d, ')')), 
      data = training, method = "lm", trControl = fitCtrl)
      }

-output

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] 18
[1] 19
[1] 20
> fitlst[[4]]
Linear Regression 

100 samples
  1 predictor

No pre-processing
Resampling: Leave-One-Out Cross-Validation 
Summary of sample sizes: 99, 99, 99, 99, 99, 99, ... 
Resampling results:

  RMSE        Rsquared   MAE       
  0.03790195  0.9779768  0.02937452

Tuning parameter 'intercept' was held constant at a value of TRUE