Home > Software engineering >  In `tidymodels` how do I do an F test to compare two models?
In `tidymodels` how do I do an F test to compare two models?

Time:10-25

In base R it is easy to compare two models with the anova() function and get an F test.

library(MASS)
lm.fit1 <- lm(medv ~ . , data = Boston)
lm.fit1a <- update(lm.fit1, ~ . - age - black)

anova(lm.fit1a, lm.fit1)

If I am working with tidymodels workflows. How do I do the same comparison? I have code like this:

library(tidymodels)
lm_spec <- linear_reg() %>%
  set_mode("regression") %>%
  set_engine("lm")

the_rec <- recipe(medv ~ ., data = Boston)

the_workflow <- workflow() %>% 
  add_recipe(the_rec) %>% 
  add_model(lm_spec)

the_workflow_fit1 <- 
  fit(the_workflow, data = Boston)
tidy(the_workflow_fit1)


the_workflow_fit1a <- 
  the_workflow_fit1  %>% 
  update_recipe(the_rec %>% step_rm(age, black)) %>% 
  fit(data = Boston) 
tidy(the_workflow_fit1a)

I don't know how to extract the right object (thingy) to feed a statement like this:

anova(the_workflow_fit1a$thingy, the_workflow_fit1$thingy)

What is the thingy I need? Is there an elegant way to do this inside of the tidymodels ecosystem?

CodePudding user response:

I am not fully familiar with tidymodels ecosystem therefore I am not sure this is the elegant solution that you look for.

I dig into the object the_workflow_fit1a and saw that subsetting .$fit$fit$fit serves the lm object which is needed by anova function.

So, in this way a solution can be considered;

models <- list(the_workflow_fit1,the_workflow_fit1a)

models2 <- lapply(models,function(x) x$fit$fit$fit)
                  
anova(models2[[1]],models2[[2]])

output;

  Res.Df    RSS    Df `Sum of Sq`     F `Pr(>F)`
   <dbl>  <dbl> <dbl>       <dbl> <dbl>    <dbl>
1    492 11079.    NA         NA  NA    NA      
2    494 11351.    -2       -272.  6.05  0.00254

CodePudding user response:

Many hours later and a post from @juliasilge https://github.com/tidymodels/workflows/issues/54 which introduced me to pull_workflow_fit() I have a tidymodels solution.

The base R code:

library(MASS)
lm.fit1 <- lm(medv ~ . , data = Boston)
lm.fit1a <- update(lm.fit1, ~ . - age - black)
anova(lm.fit1a, lm.fit1)

Can be done in tidymodels with:

library(tidymodels)
lm_spec <- linear_reg() %>%
  set_mode("regression") %>%
  set_engine("lm")

the_rec <- recipe(medv ~ ., data = Boston)

the_workflow <- workflow() %>% 
  add_recipe(the_rec) %>% 
  add_model(lm_spec)

the_workflow_fit1 <- 
  fit(the_workflow, data = Boston) %>% 
  extract_fit_parsnip()

the_workflow_fit1a <- 
  the_workflow  %>% 
  update_recipe(
    the_rec %>% step_rm(age, black)
  ) %>% 
  fit(data = Boston) %>% 
  extract_fit_parsnip()

anova(the_workflow_fit1a$fit, the_workflow_fit1$fit)
  • Related