Home > Software engineering >  one hot encoding only factor variables in R recipes
one hot encoding only factor variables in R recipes

Time:11-10

I have a dataframe df like so

height  age  dept
69       18     A
44        8     B
72       19     B
58       34     C

I want to one-hot encode only the factor variables (only dept is a factor). How can i do this?

Currently right now I'm selecting everything..

and getting this warning:

Warning message: The following variables are not factor vectors and will be ignored: height, age

ohe <- df %>% 
    recipes::recipe(~ .) %>%
    recipes::step_dummy(tidyselect::everything()) %>%
    recipes::prep() %>%
    recipes::bake(df)

CodePudding user response:

Use the where with is.factor instead of everything

library(dplyr)
df %>% 
    recipes::recipe(~ .) %>%
    recipes::step_dummy(tidyselect:::where(is.factor)) %>%
    recipes::prep() %>%
    recipes::bake(df)

-output

# A tibble: 4 × 4
  height   age dept_B dept_C
   <int> <int>  <dbl>  <dbl>
1     69    18      0      0
2     44     8      1      0
3     72    19      1      0
4     58    34      0      1

data

df <- structure(list(height = c(69L, 44L, 72L, 58L), age = c(18L, 8L, 
19L, 34L), dept = structure(c(1L, 2L, 2L, 3L), .Label = c("A", 
"B", "C"), class = "factor")), row.names = c(NA, -4L), class = "data.frame")
  • Related