Home > OS >  How to Get Variable/Feature Importance From Tidymodels ranger object?
How to Get Variable/Feature Importance From Tidymodels ranger object?

Time:09-07

I have a ranger object from the tidymodels rand_forest function:

rf <- rand_forest(mode = "regression", trees = 1000) %>% fit(pay_rate ~ age profession)

I want to get the feature importance of each variable (I have many more than in this example). I've tried things like rf$variable.importance, or importance(rf), but the former returns NULL and the latter function doesn't exist. I tried using the vip package, but that doesn't work for a ranger object. How can I extract feature importances from this object?

CodePudding user response:

You need to add importance = "impurity" when you set the engine for ranger. This will provide variable importance scores. Once this is set, you can use extract_fit_parsnip with vip to plot the variable importance.

small example:

library(tidymodels)
library(vip)

rf_mod <- rand_forest(mode = "regression", trees = 100) %>% 
  set_engine("ranger", importance = "impurity")
  
rf_recipe <- 
  recipe(mpg ~ ., data = mtcars) 

rf_workflow <- 
  workflow() %>% 
  add_model(rf_mod) %>% 
  add_recipe(rf_recipe)


rf_workflow %>% 
  fit(mtcars) %>% 
  extract_fit_parsnip() %>% 
  vip(num_features = 10)

More information is available in the tidymodels get started guide

  • Related