I am at loss on how to generate a new column with dplyr
when using a conditional statement across multiple columns.
Given some data
library(dplyr)
a <- data.frame(var1 = c("one", "two", "three"), var2 = c("three", "one", "three"), var3 = c("three", "two", NA))
> a
var1 var2 var3
1 one three three
2 two one two
3 three three <NA>
I would like to compute a new column matching a set of conditions (whether "one"
, "two"
, or "three"
are present or not; if so, return an arbitrary value 1
, 2
, or 3
), in order of priority where "one"
would have the highest priority and "three"
the lowest (i.e., the returned vector should be 1 1 3
).
My way of approaching it would be
a %>%
mutate(new_variable = case_when(
"one" %in% across(starts_with("var")) ~ 1,
"two" %in% across(starts_with("var")) ~ 2,
"three" %in% across(starts_with("var")) ~ 3,
TRUE ~ NA))
This obviously doesn't work and I suspect it would search for a match the full three columns, if it did. Is there a way in tidyverse
to do it?
Thanks!
CodePudding user response:
A base R solution:
mapping <- c("one" = 1, "two" = 2, "three" = 3)
apply(a, 1, \(x) min(mapping[x], na.rm = TRUE))
#> [1] 1 1 3
CodePudding user response:
Here's a tidyverse
solution:
a %>%
mutate(id = row_number()) %>%
pivot_longer(-id) %>%
mutate(new_var = ifelse(value == "one", 1,
ifelse(value == "two", 2, 3))
) %>%
group_by(id) %>%
summarise(new_var = min(new_var, na.rm = TRUE)) %>%
bind_cols(a, .) %>%
select(-id)
var1 var2 var3 new_var
1 one three three 1
2 two one two 1
3 three three <NA> 3