Home > database >  Error in grid_latin_hypercube() for randomForest model in R
Error in grid_latin_hypercube() for randomForest model in R

Time:11-02

I am a newbie in the field of data science and I would like to identify the reason why I have been facing the following error:

Error in grid_latin_hypercube(): these arguments contain unknowns: `mtry`. See the finalize() function. 

My computational routine is structured as follows:

datatrain <- training(data)
rf_mod <- rand_forest(
trees = tune(),
min_n = tune(),
mtry = tune()
) %>% set_engine("randomForest") %>% set_mode("regression")

tuneargs <- rf_mod 

reci <- recipe(Response ~.,datatrain)

workf <- workflow() %>%
add_model(tuneargs) %>%
add_recipe(reci)

rand_grid <- grid_latin_hypercube(trees(),
                                  min_n(),
                                  mtry(),
                                  size = 100)

After that the error described above appears.

I think the error may be associated with the fact that I'm considering a randomForest algorithm and I'm using a grid_latin_hypercube. Therefore, it may be that the parameter specifications may not be in agreement.

In this case, how could I solve it?

CodePudding user response:

In a random forest, mtry defines the number of randomly selected predictors.

As you didn't specify on which training data mtry() should run, the range is undefined:

library(dials)
library(dplyr)

mtry() %>% range_get()
$lower
[1] 1

$upper
unknown()

This is the unknown mentioned in the error message.

To avoid the error, you could as suggested use finalize to tell mtry which predictors to use:

rand_grid <- grid_latin_hypercube(trees(),
                                  min_n(),
                                  finalize(mtry(),select(datatrain,-Response)),
                                  size = 100)

## A tibble: 100 × 3
#trees min_n  mtry
#<int> <int> <int>
#1  1721    15    11
#2   730    12     4
#3  1081     3     1
#4   184    24     8
#5   222     2     3
#6  1584    31     6
#7   400    15     8
#8  1049    14     7
#9  1786     8     4
#10  1378    11    10
## … with 90 more rows
## ℹ Use `print(n = ...)` to see more rows
  • Related