How do I compare 2 column values in R (using dplyr, preferably) and create a new variable which repr-CodePudding

I have 2 columns of variables and am trying to compare them to one another. I would like to compare the values of Test and More to each other, one-by-one, then output the result of the comparison in a third column (lets call it, Match). If the values of Test and More match, I would like to output a 1 to Match, and if they do NOT match, I would like to output a 0 to match. Would also like to include something that tells R how to handle NAs in either the Test or More columns (i.e., if there is an NA in either of those columns, output an "NA" in the Match column). I've been trying to do this using case_when statements but am hitting a wall. Thanks!

df = data.frame(id  = c(1, 2, 3),
                Test = c(3, 0, 1),
                More  = c(4, 0, 0))

CodePudding user response：

You could use a dplyr mutate with case_when

df = data.frame(id  = c(1, 2, 3, 4),
                Test = c(3, 0, 1, NA),
                More  = c(4, 0, 0, 1))


df2 <- df %>% 
  mutate(match = case_when(
    Test == More ~ 1, 
    Test != More ~ 0, 
    TRUE ~ NA_real_
  ))

CodePudding user response：

library(dplyr)
df %>% 
  mutate(Match = 1 * (Test == More))

The test Test == More will yield TRUE, FALSE, or NA, depending on if there is a match, a mismatch, or if Test or More have an NA. Multiplying TRUE / FALSE / NA by 1 will yield 1 / 0 / NA.