I am trying to compare two columns in a dataframe to find rows where the two columns are not equal.
I would do:
df %>% filter(column1 != column2)
This will give me cases where values exist in both columns and are not equal (e.g. column1 = 5, column2 = 6)
However it will not give me cases where one of the values is NA (e.g. column1 = NA, column2 = 7)
How can I include the latter case into the filter function?
Thanks
CodePudding user response:
Or use xor
:
df %>% filter(a != b | xor(is.na(a), is.na(b)))
Or as @thelatemail mentioned, you could use Base R:
df[which(df$a != df$b | xor(is.na(df$a), is.na(df$b))),]
Or as @runr mentioned, you could try subset
in Base R:
subset(df, a != b | xor(is.na(a), is.na(b)))
CodePudding user response:
You can include them with an OR (|
) condition -
library(dplyr)
df <- data.frame(a = c(1, 2, NA, 4, 5), b = c(NA, 2, 3, 4, 8))
df %>% filter(a != b | is.na(a) | is.na(b))
# a b
#1 1 NA
#2 NA 3
#3 5 8
Another option would be to change NA
values to string "NA" and then only using a != b
should work.
df %>%
mutate(across(.fns = ~replace(., is.na(.), 'NA'))) %>%
filter(a != b) %>%
type.convert(as.is = TRUE)