Some columns in the dataset are one hot encoded. I wish to convert them into one factor column.
I wish to write a code where I specify which columns to be combined and converted to factor column.
Below is an example with desired result.
library(tidyverse)
tbl <- tibble(
# one hot encoded
a1_blue = c(1, 0, 0),
a1_red = c(0, 1, 0),
a1_green = c(0, 0, 1),
# one hot encoded
a2_square = c(1, 0, 0),
a2_circle = c(0, 1, 0),
a2_dot = c(0, 0, 1),
a3_letters = factor(c("A", "B", "C"))
)
tbl_desired <- tibble(
a1_colors = factor(c("blue", "red", "green"),
levels = c("blue", "red", "green")),
a2_shapes = factor(c("square", "circle", "dot"),
levels = c("square", "circle", "dot")),
a3_letters = factor(c("A", "B", "C"))
)
CodePudding user response:
This will give you the structure that you need. You can convert the columns to factors using mutate(across(everything(), as.factor))
.
tbl %>%
pivot_longer(-a3_letters) %>%
filter(value != 0) %>%
separate(name, into = c("var", "val")) %>%
pivot_wider(-value, values_from = val, names_from = var)
#> # A tibble: 3 x 3
#> a3_letters a1 a2
#> <fct> <chr> <chr>
#> 1 A blue square
#> 2 B red circle
#> 3 C green dot