Home > Software engineering >  Equality of columns using dplyr - problem with missing values
Equality of columns using dplyr - problem with missing values

Time:11-10

I am using the dplyr package in R to test equality of two columns using the code below. The results work well except for missing values where neither TRUE nor FALSE is returned

mutate(check = if_else(only == count, TRUE, FALSE))

Any ideas on how I can tweak this syntax?

Thanks in advance!

CodePudding user response:

Is this what you are looking for?

library(dplyr)

dat <- data.frame(only = c(1, NA, 2, 3, NA),
                  count = c(1, NA, 3, 2, 1))

dat %>% 
  mutate(check = if_else(only == count | is.na(only) & is.na(count), 
                         TRUE, FALSE, missing = FALSE))

  only count check
1    1     1  TRUE
2   NA    NA  TRUE
3    2     3 FALSE
4    3     2 FALSE
5   NA     1 FALSE

CodePudding user response:

mutate(check = if_else(only == count |
                         is.na(only) == is.na(count), TRUE, FALSE))

CodePudding user response:

You can try case_when. The following code means if not col1 == col2, which includes col1 != col2 or columns are NA, it will return FALSE.

# Example data frame
dat <- data.frame(
  col1 = c(1, 2, 3, 4, 5),
  col2 = c(1, 2, NA, 4, 6)
)

library(dplyr)

dat2 <- dat %>%
  mutate(check = case_when(
    col1 == col2           ~TRUE,
    TRUE                   ~FALSE
  ))

print(dat2)
#   col1 col2 check
# 1    1    1  TRUE
# 2    2    2  TRUE
# 3    3   NA FALSE
# 4    4    4  TRUE
# 5    5    6 FALSE
  • Related