I'm trying to create a formula that has a variable interacted with another variable in the final formula, but not the main effects of the variable on its own. I can't figure out how to do this with recipes. In base R I can specify which interactions I want with a colon in the formula, but I don't know how to do this with recipes. I've put together a quick reprex below with roughly what I'm getting at, if anyone has any advice that would be great :)
library(tidymodels)
basic_mod <- lm(Petal.Length ~ Petal.Width Petal.Width:Species, data = iris)
iris_rec <- recipe(Petal.Length ~ Petal.Width Species, data = iris) |>
step_dummy("Species") |>
step_interact(~ Petal.Width:starts_with("Species"))
formula(iris_rec |> prep()) # This formula includes Species on its own as well as the interaction term
#> Petal.Length ~ Petal.Width Species_versicolor Species_virginica
#> Petal.Width_x_Species_versicolor Petal.Width_x_Species_virginica
#> <environment: 0x127838968>
iris_rec |>
remove_role(starts_with("Species"), old_role = "predictor") |>
prep() |>
formula() # This formula still includes Species on its own
#> Petal.Length ~ Petal.Width Species_versicolor Species_virginica
#> Petal.Width_x_Species_versicolor Petal.Width_x_Species_virginica
#> <environment: 0x1106178a0>
Created on 2022-11-21 with reprex v2.0.2
CodePudding user response:
If I'm following you correctly, you would use step_interact()
to make the interactions and the look for the default separator ("_x_"
) for keeping terms. We ask that you make dummy variables before interactions.
library(tidymodels)
rec <-
recipe(Petal.Length ~ ., data = iris) %>%
step_dummy(Species) %>%
# dummy indicators, be default, start with {varname}_
step_interact(~ Petal.Width:starts_with("Species_")) %>%
step_select(all_outcomes(), contains("_x_"))
rec %>% prep() %>% bake(new_data = NULL) %>% names()
#> [1] "Petal.Length" "Petal.Width_x_Species_versicolor"
#> [3] "Petal.Width_x_Species_virginica"
Created on 2022-11-21 by the reprex package (v2.0.1)