I have two vectors:
a = strsplit("po","")[[1]]
[1] "p" "o"
b = strsplit("polo","")[[1]]
[1] "p" "o" "l" "o"
I'm trying to compare them using ==
.
Unfortunately, a==b
gives an unexpected result.
a==b
[1] TRUE TRUE FALSE TRUE
While I expect to have:
[1] TRUE TRUE FALSE FALSE
So, what is causing this? and how can one achieve the expected result?
The problem seems to be related to the fact that the last element of both vectors is the same as changing b
to e.g. polf
does give the expected result, and also because setting b
to pooo
gives TRUE TRUE FALSE TRUE
and not TRUE TRUE TRUE TRUE
.
Edit
In other words, I'd expect missing elements (when lengths differ) to be passed as nothing (only ""
seems to give TRUE TRUE FALSE FALSE
, NA
and NULL
give different results).
c("p","o","","")==c("p","o","l","o")
[1] TRUE TRUE FALSE FALSE
CodePudding user response:
The problem you've encountered here is due to recycling (not the eco-friendly kind). When applying an operation to two vectors that requires them to be the same length, R often automatically recycles, or repeats, the shorter one, until it is long enough to match the longer one. Your unexpected results are due to the fact that R recycles the vector c("p", "o")
to be length 4 (length of the larger vector) and essentially converts it to c("p", "o", "p", "o")
. If we compare c("p", "o", "p", "o")
and c("p", "o", "l", "o")
we can see we get the unexpected results of above:
c("p", "o", "p", "o") == c("p", "o", "l", "o")
#> [1] TRUE TRUE FALSE TRUE
It's not exactly clear to me why you would expect the result to be TRUE TRUE FALSE FALSE
, as it's somewhat of an ambiguous comparison to compare a length-2 vector to a length-4 vector, and recycling the length-2 vector (which is what R is doing) seems to be the most reasonable default aside from throwing an error.
CodePudding user response:
To get the result shown in OP we may put the two vectors in a list
, adapt their length
s to max
imum lengths
(by adding NA's
) and test if the comparison is %in% TRUE
.
list(a, b) |>
(\(.) lapply(., `length<-`, max(lengths(.))))() |>
(\(.) do.call(\(x, y, ...) (x == y) %in% TRUE, .))()
# [1] TRUE TRUE FALSE FALSE
Note: R version 4.1.2 (2021-11-01)
Data:
a <- c("p", "o")
b <- c("p", "o", "l", "o")