Home > Net >  How to generate a column based on matches over multiple columns in dplyr?
How to generate a column based on matches over multiple columns in dplyr?

Time:11-10

I am at loss on how to generate a new column with dplyr when using a conditional statement across multiple columns.

Given some data

library(dplyr)
 a <- data.frame(var1 = c("one", "two", "three"), var2 = c("three", "one", "three"), var3 = c("three", "two", NA))
> a
   var1  var2  var3
1   one three three
2   two   one   two
3 three three  <NA>

I would like to compute a new column matching a set of conditions (whether "one", "two", or "three" are present or not; if so, return an arbitrary value 1, 2, or 3), in order of priority where "one" would have the highest priority and "three" the lowest (i.e., the returned vector should be 1 1 3).

My way of approaching it would be

a %>%
mutate(new_variable = case_when(
             "one" %in% across(starts_with("var")) ~ 1,
             "two" %in% across(starts_with("var")) ~ 2,
             "three" %in% across(starts_with("var")) ~ 3,
             TRUE ~ NA))

This obviously doesn't work and I suspect it would search for a match the full three columns, if it did. Is there a way in tidyverse to do it? Thanks!

CodePudding user response:

A base R solution:

mapping <- c("one" = 1, "two" = 2, "three" = 3)

apply(a, 1, \(x) min(mapping[x], na.rm = TRUE))

#> [1] 1 1 3

CodePudding user response:

Here's a tidyverse solution:

a %>%
  mutate(id = row_number()) %>%
  pivot_longer(-id) %>%
  mutate(new_var = ifelse(value == "one", 1,
                          ifelse(value == "two", 2, 3))
         ) %>%
  group_by(id) %>%
  summarise(new_var = min(new_var, na.rm = TRUE)) %>%
  bind_cols(a, .) %>%
  select(-id)
   var1  var2  var3 new_var
1   one three three       1
2   two   one   two       1
3 three three  <NA>       3
  • Related