I'm trying to replace some dummy coded variables with the actual categorical value. I know there's ways to replace this using base R, dplyr, and the *apply or within functions, but my brain just can't figure out which is most efficient without looping it.
Currently, I have 2 data frames. Examples of what they look like below.
Note: Instead of 5 variables, there are closer to 10 and this operation needs to be done multiple times so looping or a simple df <- df[df$var == 1] <- "a"
won't work.
df1
# id v1 v2 v3 v4 v5
# 1 1 0 1 0 1 0
# 2 3 0 0 0 0 1
# 3 5 0 1 1 0 0
# 4 12 1 0 1 0 0
# 5 26 0 0 0 1 0
...
codes
# name label
# 1 v1 "a"
# 2 v2 "b"
# 3 v3 "c"
# 4 v4 "d"
# 5 v5 "e"
The goal is to get the values in codes$label
to replace the 1's according to the relevant column name in df1
so the end result is:
df1
# id v1 v2 v3 v4 v5
# 1 1 0 b 0 d 0
# 2 3 0 0 0 0 e
# 3 5 0 b c 0 0
# 4 12 a 0 c 0 0
# 5 26 0 0 0 d 0
I know there's a way to do it, but I can't find any code snippets online that do it over multiple columns and when referencing another dataframe.
Thank you!
P.S. These are both technically tibbles() formatted with tidyverse
CodePudding user response:
Tidyverse solution. You can:
reshape
df1
to long formjoin with codes by name and value (for value = 1)
replace missing labels with 0
remove unwanted column
value
reshape back to wide form using the new labels as values
library(dplyr) library(tidyr) df1 %>% pivot_longer(-id) %>% left_join(mutate(codes, value = 1)) %>% mutate(label = ifelse(is.na(label), 0, label)) %>% select(-value) %>% pivot_wider(names_from = "name", values_from = "label")
Result:
# A tibble: 5 × 6
id v1 v2 v3 v4 v5
<int> <chr> <chr> <chr> <chr> <chr>
1 1 0 b 0 d 0
2 3 0 0 0 0 e
3 5 0 b c 0 0
4 12 a 0 c 0 0
5 26 0 0 0 d 0
Data:
df1 <- structure(list(id = c(1L, 3L, 5L, 12L, 26L),
v1 = c(0L, 0L, 0L, 1L, 0L),
v2 = c(1L, 0L, 1L, 0L, 0L),
v3 = c(0L, 0L, 1L, 1L, 0L),
v4 = c(1L, 0L, 0L, 0L, 1L),
v5 = c(0L, 1L, 0L, 0L, 0L)),
class = "data.frame",
row.names = c("1", "2", "3", "4", "5"))
codes <- structure(list(name = c("v1", "v2", "v3", "v4", "v5"),
label = c("a", "b", "c", "d", "e")),
class = "data.frame",
row.names = c("1", "2", "3", "4", "5"))
CodePudding user response:
Wuth dplyr
, you could use across
cur_column()
:
library(dplyr)
df1 %>%
mutate(across(v1:v5, ~ replace(.x, .x == 1, codes$label[codes$name == cur_column()])))
# id v1 v2 v3 v4 v5
# 1 1 0 b 0 d 0
# 2 3 0 0 0 0 e
# 3 5 0 b c 0 0
# 4 12 a 0 c 0 0
# 5 26 0 0 0 d 0