Home > Enterprise >  How do I set the upper range mtry tuning value in mlr3, when I also conduct automated feature select
How do I set the upper range mtry tuning value in mlr3, when I also conduct automated feature select

Time:08-18

Date: 2022-08-17. R Version: 4.0.3. Platform: x86_64-apple-darwin17.0 (64-bit)

Problem: In mlr3 (classif.task, learner: random forest), I use automated hyperparameter optimization (HPO; mtry in the range between 1 and the number of features in the data), and automated feature selection (single criterion: msr = classif.auc).

I run into this ranger error message: 'mtry can not be larger than number of variables in data. Ranger will EXIT now.' I am relatively sure that what happens is when a subset of features have been selected and HPO attempts to assess the performance for a higher number of features, that this produces the error. If this is true, then how do I set the upper range limit in HPO for the mtry parameter in such a case (see repex below)?

# Make data with binary outcome.
set.seed(123); n <- 500
for(i in 1:9) {
    assign(paste0("x", i), rnorm(n=n, mean = 0, sd = sample(1:6,1)))
}
z <- 0   (.02*x1)   .03*x2 - .06*x3   .03*x4   .1*x5   .08*x6   .09*x7 - .008*x8   .045*x9
pr = 1/(1 exp(-z))
y = rbinom(n, 1, pr)
dat <- data.frame(y=factor(y), x1, x2, x3, x4, x5, x6, x7, x8, x9)
# 
library(mlr3verse)
tskclassif <- TaskClassif$new(id="rangerCheck", backend=dat, target="y")
randomForest <- lrn("classif.ranger", predict_type = "prob")
# Question: How do I set the upper range limit for the mtry parameter, in order to not get the error message?
searchSpaceRANDOMFOREST <- ps(mtry=p_int(lower = 1, upper = (ncol(dat)-1)))
# Hyperparameter optimization
resamplingTuner <- rsmp("cv", folds=4)
tuner <- 
atRANDOMFOREST <- AutoTuner$new(
    learner=randomForest,
    resampling = resamplingTuner,
    measure = msr("classif.auc"),
    search_space = searchSpaceRANDOMFOREST,
    terminator = trm("evals", n_evals = 10),
    tuner = tnr("random_search"))
# Feature selection
instance = FSelectInstanceSingleCrit$new(
    task = tskclassif,
    learner = atRANDOMFOREST,
    resampling = rsmp("holdout", ratio = .8),
    measure = msr("classif.auc"),
    terminator = trm("evals", n_evals = 20)
)
fselector <- fs("random_search")
fselector$optimize(instance)
# Error message:
# Error: mtry can not be larger than number of variables in data. Ranger will EXIT now.
# Fehler in ranger::ranger(dependent.variable.name = task$target_names, data = task$data(),  : User interrupt or internal error.
# This happened PipeOp classif.ranger.tuned's $train()

CodePudding user response:

You should be able to use the mtry.ratio parameter in https://mlr3learners.mlr-org.com/reference/mlr_learners_classif.ranger.html instead of mtry to have a dynamic feature count selection during tuning which does not exceed the number of available features.

  • Related