In base R it is easy to compare two models with the anova() function and get an F test.
library(MASS)
lm.fit1 <- lm(medv ~ . , data = Boston)
lm.fit1a <- update(lm.fit1, ~ . - age - black)
anova(lm.fit1a, lm.fit1)
If I am working with tidymodels
workflows. How do I do the same comparison? I have code like this:
library(tidymodels)
lm_spec <- linear_reg() %>%
set_mode("regression") %>%
set_engine("lm")
the_rec <- recipe(medv ~ ., data = Boston)
the_workflow <- workflow() %>%
add_recipe(the_rec) %>%
add_model(lm_spec)
the_workflow_fit1 <-
fit(the_workflow, data = Boston)
tidy(the_workflow_fit1)
the_workflow_fit1a <-
the_workflow_fit1 %>%
update_recipe(the_rec %>% step_rm(age, black)) %>%
fit(data = Boston)
tidy(the_workflow_fit1a)
I don't know how to extract the right object (thingy) to feed a statement like this:
anova(the_workflow_fit1a$thingy, the_workflow_fit1$thingy)
What is the thingy I need? Is there an elegant way to do this inside of the tidymodels
ecosystem?
CodePudding user response:
I am not fully familiar with tidymodels
ecosystem therefore I am not sure this is the elegant solution that you look for.
I dig into the object the_workflow_fit1a
and saw that subsetting .$fit$fit$fit
serves the lm
object which is needed by anova
function.
So, in this way a solution can be considered;
models <- list(the_workflow_fit1,the_workflow_fit1a)
models2 <- lapply(models,function(x) x$fit$fit$fit)
anova(models2[[1]],models2[[2]])
output;
Res.Df RSS Df `Sum of Sq` F `Pr(>F)`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 492 11079. NA NA NA NA
2 494 11351. -2 -272. 6.05 0.00254
CodePudding user response:
Many hours later and a post from @juliasilge https://github.com/tidymodels/workflows/issues/54 which introduced me to pull_workflow_fit()
I have a tidymodels
solution.
The base R code:
library(MASS)
lm.fit1 <- lm(medv ~ . , data = Boston)
lm.fit1a <- update(lm.fit1, ~ . - age - black)
anova(lm.fit1a, lm.fit1)
Can be done in tidymodels
with:
library(tidymodels)
lm_spec <- linear_reg() %>%
set_mode("regression") %>%
set_engine("lm")
the_rec <- recipe(medv ~ ., data = Boston)
the_workflow <- workflow() %>%
add_recipe(the_rec) %>%
add_model(lm_spec)
the_workflow_fit1 <-
fit(the_workflow, data = Boston) %>%
extract_fit_parsnip()
the_workflow_fit1a <-
the_workflow %>%
update_recipe(
the_rec %>% step_rm(age, black)
) %>%
fit(data = Boston) %>%
extract_fit_parsnip()
anova(the_workflow_fit1a$fit, the_workflow_fit1$fit)