Home > OS >  Matching unordered values in R
Matching unordered values in R

Time:03-10

I'm working with a dataset with two columns that look something like this:

Row1 Row2
1, 2 2, 5
2, 6, 4 2, 6
3, 1 1, 3
2, 1, 4 1, 4, 2
3 3, 2

I want to run a script that allows me to identify whether Row2 matches Row1. The need to have the same exact values, but they don't need to be in the same order. So given the above, I'd want a result that tells me the following:

Row1 Row2 Match
1, 2 2, 5 FALSE
2, 6, 4 2, 6 FALSE
3, 1 1, 3 TRUE
2, 1, 4 1, 4, 2 TRUE
3 3, 2 FALSE

I've tried using match() and compare() and haven't found success with either. Match() produces TRUE as long as all the elements of Row1 are found in Row2, but this isn't what I'm looking for. I need to produce TRUE only when Row2 has the same exact numbers as Row1 and only those numbers, irrespective of order. On the other hand, Compare() produces an error if I try to create a new column to identify matches. This is what I enter:

df$match <- compareIgnoreOrder(df$row1, df$row2)

I've also tried this way:

df$match <- compare(df$row1, df$row2, ignoreAll = TRUE)

Both methods yield the following error: "Input must be a vector, not a object." And at this point I'm stuck. I've searched high and low but can't find any solutions. Help would be much appreciated.

CodePudding user response:

Something like:

data %>%
  rowwise() %>% 
  mutate(Match = length(intersect(Row1,Row2)) == length(union(Row1,Row2)))

Output:

  Row1      Row2      Match
  <list>    <list>    <lgl>
1 <dbl [2]> <dbl [2]> FALSE
2 <dbl [3]> <dbl [2]> FALSE
3 <dbl [2]> <dbl [2]> TRUE 
4 <dbl [3]> <dbl [3]> TRUE 
5 <dbl [1]> <dbl [2]> FALSE

Input:

data <- tibble(
  Row1 = list(c(1,2), c(2,6,4), c(3,1), c(2,1,4), c(3)),
  Row2 = list(c(2,5), c(2,6), c(1,3), c(1,4,2), c(3,2))
)

CodePudding user response:

You're comparing sets, so a set operation like ?setequal makes sense to me:

dat <- data.frame(
  Row1 = I(list(c(1,2), c(2,6,4), c(3,1), c(2,1,4), c(3))),
  Row2 = I(list(c(2,5), c(2,6), c(1,3), c(1,4,2), c(3,2)))
)

mapply(setequal, dat$Row1, dat$Row2)
##[1] FALSE FALSE  TRUE  TRUE FALSE
  • Related