How to replace multiple dummy-coded columns with labels efficiently?-CodePudding

I'm trying to replace some dummy coded variables with the actual categorical value. I know there's ways to replace this using base R, dplyr, and the *apply or within functions, but my brain just can't figure out which is most efficient without looping it.

Currently, I have 2 data frames. Examples of what they look like below. Note: Instead of 5 variables, there are closer to 10 and this operation needs to be done multiple times so looping or a simple df <- df[df$var == 1] <- "a" won't work.

df1
#   id  v1  v2  v3  v4  v5  
# 1  1   0  1  0  1  0
# 2  3   0  0  0  0  1
# 3  5   0  1  1  0  0
# 4  12  1  0  1  0  0
# 5  26  0  0  0  1  0
...

codes
#   name  label
# 1  v1  "a"
# 2  v2  "b"
# 3  v3  "c"
# 4  v4  "d"
# 5  v5  "e"

The goal is to get the values in codes$label to replace the 1's according to the relevant column name in df1 so the end result is:

df1
#   id  v1  v2  v3  v4  v5  
# 1  1   0  b  0  d  0
# 2  3   0  0  0  0  e
# 3  5   0  b  c  0  0
# 4  12  a  0  c  0  0
# 5  26  0  0  0  d  0

I know there's a way to do it, but I can't find any code snippets online that do it over multiple columns and when referencing another dataframe.

Thank you!

P.S. These are both technically tibbles() formatted with tidyverse

CodePudding user response：

Tidyverse solution. You can:

reshape df1 to long form
join with codes by name and value (for value = 1)
replace missing labels with 0
remove unwanted column value

reshape back to wide form using the new labels as values

library(dplyr)
library(tidyr)

df1 %>% 
  pivot_longer(-id) %>% 
  left_join(mutate(codes, value = 1)) %>% 
  mutate(label = ifelse(is.na(label), 0, label)) %>% 
  select(-value) %>% 
  pivot_wider(names_from = "name", 
              values_from = "label")

Result:

# A tibble: 5 × 6
     id v1    v2    v3    v4    v5   
  <int> <chr> <chr> <chr> <chr> <chr>
1     1 0     b     0     d     0    
2     3 0     0     0     0     e    
3     5 0     b     c     0     0    
4    12 a     0     c     0     0    
5    26 0     0     0     d     0

Data:

df1 <- structure(list(id = c(1L, 3L, 5L, 12L, 26L), 
                      v1 = c(0L, 0L, 0L, 1L, 0L), 
                      v2 = c(1L, 0L, 1L, 0L, 0L), 
                      v3 = c(0L, 0L, 1L, 1L, 0L), 
                      v4 = c(1L, 0L, 0L, 0L, 1L), 
                      v5 = c(0L, 1L, 0L, 0L, 0L)), 
                 class = "data.frame", 
                 row.names = c("1", "2", "3", "4", "5"))

codes <- structure(list(name = c("v1", "v2", "v3", "v4", "v5"), 
                        label = c("a", "b", "c", "d", "e")), 
                        class = "data.frame", 
                        row.names = c("1", "2", "3", "4", "5"))

CodePudding user response：

Wuth dplyr, you could use across cur_column():

library(dplyr)

df1 %>%
  mutate(across(v1:v5, ~ replace(.x, .x == 1, codes$label[codes$name == cur_column()])))

#   id v1 v2 v3 v4 v5
# 1  1  0  b  0  d  0
# 2  3  0  0  0  0  e
# 3  5  0  b  c  0  0
# 4 12  a  0  c  0  0
# 5 26  0  0  0  d  0