It is known that when building an xgboost model through the boost_tree()
function, it is possible to introduce a gamma regression through the objective argument of the set_engine()
function, as seen below:
xgbst = boost_tree(
trees = tune(),
tree_depth = tune(),
min_n = tune(),
learn_rate = tune(),
loss_reduction = tune(),
sample_size = tune()) %>%
set_engine("xgboost", objective = "reg:gamma") %>%
set_mode("regression")
However, I am interested in using a random forest model. Therefore, considering that it is possible to introduce arguments in the objective function, I tried to repeat the same computational procedure above, but for random forest, like below:
library(randomForest)
library(parsnip)
rfmod = rand_forest(
trees = tune(),
mtry = tune(),
min_n = tune()) %>%
set_engine("randomForest", objective = "reg:gamma") %>%
set_mode("regression")
As a result, I am facing an error associated with the fact that it is not possible to introduce Gamma regression in the above model and that there is a bad specification in the computational routine. In the literature I have already found works that made use of the Gamma distribution in random forest models.
In this case, how could I solve it?
CodePudding user response:
That method you are looking for is not in the rpart
or randomForest
to my understanding. However there is a package distRforest
which has a method (function) called rforest
, within this method you can set a typeof forest to build. One of the options is the Gamma.
method = 'gamma'
and you can also use gamma in checking it the out of bag error by setting:
track_oob=TRUE
when using the method=gamma
See if you are able to configure your model that way with this vignetted as a guide
CodePudding user response:
From Random Forests(TM) in XGBoost follows that you can set the algorithm to fit a randomForest, given the sample_size is less than 1 and you do not do boosting stages:
library(parsnip)
xgbst = boost_tree(
trees = 100,
learn_rate = 1,
sample_size = 0.8) %>%
set_engine("xgboost", objective = "reg:gamma",
num_boost_round=1, colsample_bytree=0.8, counts=F) %>%
set_mode("regression")
fit(xgbst, mpg ~ ., data = mtcars, )