Comparing two columns in a dataframe using R-CodePudding

I am trying to compare two columns in a dataframe to find rows where the two columns are not equal.

I would do:

df %>% filter(column1 != column2)

This will give me cases where values exist in both columns and are not equal (e.g. column1 = 5, column2 = 6)

However it will not give me cases where one of the values is NA (e.g. column1 = NA, column2 = 7)

How can I include the latter case into the filter function?

Thanks

CodePudding user response：

Or use xor:

df %>% filter(a != b | xor(is.na(a), is.na(b)))

Or as @thelatemail mentioned, you could use Base R:

df[which(df$a != df$b | xor(is.na(df$a), is.na(df$b))),]

Or as @runr mentioned, you could try subset in Base R:

subset(df, a != b | xor(is.na(a), is.na(b)))

CodePudding user response：

You can include them with an OR (|) condition -

library(dplyr)

df <- data.frame(a = c(1, 2, NA, 4, 5), b = c(NA, 2, 3, 4, 8))

df %>% filter(a != b | is.na(a) | is.na(b))

#   a  b
#1  1 NA
#2 NA  3
#3  5  8

Another option would be to change NA values to string "NA" and then only using a != b should work.

df %>% 
  mutate(across(.fns = ~replace(., is.na(.), 'NA'))) %>%
  filter(a != b) %>%
  type.convert(as.is = TRUE)