Home > Software engineering >  Unexpected result comparing strings with `==`
Unexpected result comparing strings with `==`

Time:12-26

I have two vectors:

a = strsplit("po","")[[1]]
[1] "p" "o"

b = strsplit("polo","")[[1]]
[1] "p" "o" "l" "o"

I'm trying to compare them using ==. Unfortunately, a==b gives an unexpected result.

a==b
[1]  TRUE  TRUE FALSE  TRUE

While I expect to have:

[1]  TRUE  TRUE FALSE  FALSE

So, what is causing this? and how can one achieve the expected result?

The problem seems to be related to the fact that the last element of both vectors is the same as changing b to e.g. polf does give the expected result, and also because setting b to pooo gives TRUE TRUE FALSE TRUE and not TRUE TRUE TRUE TRUE.

Edit

In other words, I'd expect missing elements (when lengths differ) to be passed as nothing (only "" seems to give TRUE TRUE FALSE FALSE, NA and NULL give different results).

c("p","o","","")==c("p","o","l","o")
[1]  TRUE  TRUE FALSE FALSE

CodePudding user response:

The problem you've encountered here is due to recycling (not the eco-friendly kind). When applying an operation to two vectors that requires them to be the same length, R often automatically recycles, or repeats, the shorter one, until it is long enough to match the longer one. Your unexpected results are due to the fact that R recycles the vector c("p", "o") to be length 4 (length of the larger vector) and essentially converts it to c("p", "o", "p", "o"). If we compare c("p", "o", "p", "o") and c("p", "o", "l", "o") we can see we get the unexpected results of above:

c("p", "o", "p", "o") == c("p", "o", "l", "o")
#> [1]  TRUE  TRUE FALSE  TRUE

It's not exactly clear to me why you would expect the result to be TRUE TRUE FALSE FALSE, as it's somewhat of an ambiguous comparison to compare a length-2 vector to a length-4 vector, and recycling the length-2 vector (which is what R is doing) seems to be the most reasonable default aside from throwing an error.

CodePudding user response:

To get the result shown in OP we may put the two vectors in a list, adapt their lengths to maximum lengths (by adding NA's) and test if the comparison is %in% TRUE.

list(a, b) |>
  (\(.) lapply(., `length<-`, max(lengths(.))))() |>
  (\(.) do.call(\(x, y, ...) (x == y) %in% TRUE, .))()
# [1]  TRUE  TRUE FALSE FALSE

Note: R version 4.1.2 (2021-11-01)


Data:

a <- c("p", "o")
b <- c("p", "o", "l", "o")
  • Related