I am a newbie in the field of data science and I would like to identify the reason why I have been facing the following error:
Error in grid_latin_hypercube(): these arguments contain unknowns: `mtry`. See the finalize() function.
My computational routine is structured as follows:
datatrain <- training(data)
rf_mod <- rand_forest(
trees = tune(),
min_n = tune(),
mtry = tune()
) %>% set_engine("randomForest") %>% set_mode("regression")
tuneargs <- rf_mod
reci <- recipe(Response ~.,datatrain)
workf <- workflow() %>%
add_model(tuneargs) %>%
add_recipe(reci)
rand_grid <- grid_latin_hypercube(trees(),
min_n(),
mtry(),
size = 100)
After that the error described above appears.
I think the error may be associated with the fact that I'm considering a randomForest algorithm and I'm using a grid_latin_hypercube. Therefore, it may be that the parameter specifications may not be in agreement.
In this case, how could I solve it?
CodePudding user response:
In a random forest, mtry
defines the number of randomly selected predictors.
As you didn't specify on which training data mtry()
should run, the range is undefined:
library(dials)
library(dplyr)
mtry() %>% range_get()
$lower
[1] 1
$upper
unknown()
This is the unknown mentioned in the error message.
To avoid the error, you could as suggested use finalize
to tell mtry
which predictors to use:
rand_grid <- grid_latin_hypercube(trees(),
min_n(),
finalize(mtry(),select(datatrain,-Response)),
size = 100)
## A tibble: 100 × 3
#trees min_n mtry
#<int> <int> <int>
#1 1721 15 11
#2 730 12 4
#3 1081 3 1
#4 184 24 8
#5 222 2 3
#6 1584 31 6
#7 400 15 8
#8 1049 14 7
#9 1786 8 4
#10 1378 11 10
## … with 90 more rows
## ℹ Use `print(n = ...)` to see more rows