Home > Software engineering >  reg:gamma in a random forest model in R
reg:gamma in a random forest model in R

Time:11-03

It is known that when building an xgboost model through the boost_tree() function, it is possible to introduce a gamma regression through the objective argument of the set_engine() function, as seen below:

xgbst = boost_tree(
         trees = tune(), 
         tree_depth = tune(),
         min_n = tune(),
         learn_rate = tune(),
         loss_reduction = tune(),
         sample_size = tune()) %>%
set_engine("xgboost", objective = "reg:gamma") %>%
set_mode("regression")

However, I am interested in using a random forest model. Therefore, considering that it is possible to introduce arguments in the objective function, I tried to repeat the same computational procedure above, but for random forest, like below:

library(randomForest)
library(parsnip)
rfmod = rand_forest(
         trees = tune(),
         mtry = tune(),
         min_n = tune()) %>%
set_engine("randomForest", objective = "reg:gamma") %>%
set_mode("regression")

As a result, I am facing an error associated with the fact that it is not possible to introduce Gamma regression in the above model and that there is a bad specification in the computational routine. In the literature I have already found works that made use of the Gamma distribution in random forest models.

In this case, how could I solve it?

CodePudding user response:

That method you are looking for is not in the rpart or randomForest to my understanding. However there is a package distRforest which has a method (function) called rforest, within this method you can set a typeof forest to build. One of the options is the Gamma.

method = 'gamma'

and you can also use gamma in checking it the out of bag error by setting:

track_oob=TRUE when using the method=gamma

See if you are able to configure your model that way with this vignetted as a guide

CodePudding user response:

From Random Forests(TM) in XGBoost follows that you can set the algorithm to fit a randomForest, given the sample_size is less than 1 and you do not do boosting stages:

library(parsnip)

xgbst = boost_tree(
         trees = 100, 
         learn_rate = 1,
         sample_size = 0.8) %>%
set_engine("xgboost", objective = "reg:gamma", 
           num_boost_round=1, colsample_bytree=0.8, counts=F) %>%
set_mode("regression")

fit(xgbst, mpg ~ ., data = mtcars, )


  • Related