I have
df<-data.frame(id=c(1,2,3,4,5),one=c(0,1,0,0,0), two=c(0,0,1,0,0), three=c(0,0,0,1,0), four=c(0,0,0,0,1))
df
I would like to gather the dummy variables one, two, three and four into the variable out and I would like to recode the levels of the out variable so that one="A", two="B", three="C" and four="D"
df.out<-data.frame(id=c(1,2,3,4,5),out=c(NA, "A","B","C","D" ))
df.out
is it possible to do it in dplyr? I tried gather but I get different lengths of the output data.frame and i think pivot wider and longer are not the right choice.
The df is part of a much larger, wider and longer data frame. I would like to be able to specify which colums to merge and how to recode. Ideally a tidyverse solution
I am looking for the exact opposite of this
CodePudding user response:
You can do,
LETTERS[ifelse(rowSums(df[-1]) == 0, NA, max.col(df[-1]))]
#[1] NA "A" "B" "C" "D"
CodePudding user response:
Here's an idea:
library(tidyverse)
ids <- 1:nrow(df)
df %>%
mutate(id = ids) %>%
pivot_longer(cols = one:four) %>%
filter(value == 1) %>%
mutate(out = plyr::mapvalues(
name,
from = c("one", "two", "three", "four"),
to = LETTERS[1:4])) %>%
select(-name, -value) %>%
right_join(tibble(id = ids)) %>%
arrange(id)
#> Joining, by = "id"
#> # A tibble: 5 × 2
#> id out
#> <int> <chr>
#> 1 1 <NA>
#> 2 2 A
#> 3 3 B
#> 4 4 C
#> 5 5 D
Created on 2021-10-22 by the reprex package (v2.0.1)
Whether this is efficient, I'm not sure. I also needed to add an "id" variable to make it work this way.
CodePudding user response:
A dplyr
only solution:
df %>%
rename_with(~ c("id", LETTERS[1:4], colnames(df))) %>%
mutate(across(2:5, ~case_when(. == 1 ~ cur_column()))) %>%
mutate(out = coalesce(A,B,C,D), .keep="unused")
output:
id out
1 1 <NA>
2 2 A
3 3 B
4 4 C
5 5 D
CodePudding user response:
I couldn't find a one-step solution, but this appears solution appears to give you the flexibility you want.
recode_fn <- function(x) {
case_when(x == "one" ~ "A", x == "two" ~ "B", x == "three" ~ "C", x == "four" ~ "D", TRUE ~ "!! Error")
}
df %>%
pivot_longer(
cols=c(one, two, three, four),
names_to="Variable",
values_to="Value"
) %>%
mutate(
Variable=recode_fn(Variable)
)
# A tibble: 20 × 2
Variable Value
<chr> <dbl>
1 A 0
2 B 0
3 C 0
4 D 0
5 A 1
6 B 0
7 C 0
8 D 0
9 A 0
10 B 1
11 C 0
12 D 0
13 A 0
14 B 0
15 C 1
16 D 0
17 A 0
18 B 0
19 C 0
20 D 1
CodePudding user response:
You can write a function which returns NA
when all the value are 0 or else index of max value.
library(dplyr)
return_value <- function(x) {
if(all(x == 0)) return(NA)
else which.max(x)
}
df %>%
rowwise() %>%
mutate(out = return_value(c_across(one:four)))
# id one two three four out
# <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#1 1 0 0 0 0 NA
#2 2 1 0 0 0 1
#3 3 0 1 0 0 2
#4 4 0 0 1 0 3
#5 5 0 0 0 1 4
To get 'A'
, 'B'
etc you may use inbuilt LETTERS
.
df %>%
rowwise() %>%
mutate(out = return_value(c_across(one:four))) %>%
ungroup %>%
mutate(out = LETTERS[out])