Home > front end >  Convert multiple columns that are onehot encoded into a factor column
Convert multiple columns that are onehot encoded into a factor column


Some columns in the dataset are one hot encoded. I wish to convert them into one factor column.

I wish to write a code where I specify which columns to be combined and converted to factor column.

Below is an example with desired result.


tbl <- tibble(
  # one hot encoded
  a1_blue = c(1, 0, 0),
  a1_red = c(0, 1, 0),
  a1_green = c(0, 0, 1),
  # one hot encoded
  a2_square = c(1, 0, 0),
  a2_circle = c(0, 1, 0),
  a2_dot = c(0, 0, 1),
  a3_letters = factor(c("A", "B", "C"))

tbl_desired <- tibble(
  a1_colors = factor(c("blue", "red", "green"), 
              levels = c("blue", "red", "green")), 
  a2_shapes = factor(c("square", "circle", "dot"), 
              levels = c("square", "circle", "dot")), 
  a3_letters = factor(c("A", "B", "C"))

CodePudding user response:

This will give you the structure that you need. You can convert the columns to factors using mutate(across(everything(), as.factor)).

tbl %>% 
  pivot_longer(-a3_letters) %>% 
  filter(value != 0) %>% 
  separate(name, into = c("var", "val")) %>% 
  pivot_wider(-value, values_from = val, names_from = var)

#> # A tibble: 3 x 3
#>   a3_letters a1    a2    
#>   <fct>      <chr> <chr> 
#> 1 A          blue  square
#> 2 B          red   circle
#> 3 C          green dot
  • Related