Gather dummy variables and recode factors-CodePudding

I have

df<-data.frame(id=c(1,2,3,4,5),one=c(0,1,0,0,0), two=c(0,0,1,0,0), three=c(0,0,0,1,0), four=c(0,0,0,0,1))
df

I would like to gather the dummy variables one, two, three and four into the variable out and I would like to recode the levels of the out variable so that one="A", two="B", three="C" and four="D"

df.out<-data.frame(id=c(1,2,3,4,5),out=c(NA, "A","B","C","D" ))
df.out

is it possible to do it in dplyr? I tried gather but I get different lengths of the output data.frame and i think pivot wider and longer are not the right choice.

The df is part of a much larger, wider and longer data frame. I would like to be able to specify which colums to merge and how to recode. Ideally a tidyverse solution

I am looking for the exact opposite of this

CodePudding user response：

You can do,

LETTERS[ifelse(rowSums(df[-1]) == 0, NA, max.col(df[-1]))]
#[1] NA  "A" "B" "C" "D"

CodePudding user response：

Here's an idea:

library(tidyverse)
ids <- 1:nrow(df)
df %>%
    mutate(id = ids) %>%
    pivot_longer(cols = one:four) %>%
    filter(value == 1) %>%
    mutate(out = plyr::mapvalues(
        name, 
        from = c("one", "two", "three", "four"), 
        to = LETTERS[1:4])) %>%
    select(-name, -value) %>%
    right_join(tibble(id = ids)) %>%
    arrange(id)
#> Joining, by = "id"
#> # A tibble: 5 × 2
#>      id out  
#>   <int> <chr>
#> 1     1 <NA> 
#> 2     2 A    
#> 3     3 B    
#> 4     4 C    
#> 5     5 D

^{Created on 2021-10-22 by the reprex package (v2.0.1)}

Whether this is efficient, I'm not sure. I also needed to add an "id" variable to make it work this way.

CodePudding user response：

A dplyr only solution:

df %>% 
  rename_with(~ c("id", LETTERS[1:4], colnames(df))) %>% 
  mutate(across(2:5, ~case_when(. == 1 ~ cur_column()))) %>% 
  mutate(out = coalesce(A,B,C,D), .keep="unused")

output:

    id  out
1  1 <NA>
2  2    A
3  3    B
4  4    C
5  5    D

CodePudding user response：

I couldn't find a one-step solution, but this appears solution appears to give you the flexibility you want.

recode_fn <- function(x) {
  case_when(x == "one" ~ "A", x == "two" ~ "B", x == "three" ~ "C", x == "four" ~ "D", TRUE ~ "!! Error")
}

df %>% 
  pivot_longer(
    cols=c(one, two, three, four),
    names_to="Variable",
    values_to="Value"
  ) %>% 
  mutate(
    Variable=recode_fn(Variable)
  )
# A tibble: 20 × 2
   Variable Value
   <chr>    <dbl>
 1 A            0
 2 B            0
 3 C            0
 4 D            0
 5 A            1
 6 B            0
 7 C            0
 8 D            0
 9 A            0
10 B            1
11 C            0
12 D            0
13 A            0
14 B            0
15 C            1
16 D            0
17 A            0
18 B            0
19 C            0
20 D            1

CodePudding user response：

You can write a function which returns NA when all the value are 0 or else index of max value.

library(dplyr)

return_value <- function(x) {
  if(all(x == 0)) return(NA)
  else which.max(x)
}

df %>%
  rowwise() %>%
  mutate(out = return_value(c_across(one:four)))

#     id   one   two three  four   out
#  <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#1     1     0     0     0     0    NA
#2     2     1     0     0     0     1
#3     3     0     1     0     0     2
#4     4     0     0     1     0     3
#5     5     0     0     0     1     4

To get 'A', 'B' etc you may use inbuilt LETTERS.

df %>%
  rowwise() %>%
  mutate(out = return_value(c_across(one:four))) %>%
  ungroup %>%
  mutate(out = LETTERS[out])