I have a two argument function that takes as its first input a triple of pairs of numbers in the form "(a, b)(c, d)(e, f)" (as a character string) and as second argument a pair of numbers (also written as a character string of the form "(a, b)") and outputs a logical that states if the pair (the second argument) is one of the three pairs in the triple (the first argument). I actually wrote two versions:
version1 <- function(x, y){#x is a triple of pairs, y is a pair
pairsfromthistriple <- paste(c("", "(", "("), strsplit(x, split = ")(", fixed = T)[[1]], c(")", ")", ""), sep = "")
y %in% pairsfromthistriple
}
version2 <- function(x, y){#x is triple of pairs, y is pair
y == substr(x, 1, 6) | y == substr(x, 7, 12) | y == substr(x, 13, 18)
}
I want to set this function loose for every triple-of-pairs from a vector of triples an every pair from some vector of pairs using outer
. For here I'll us the following very short vectors:
triples <- c("(1, 2)(3, 4)(5, 6)", "(1, 2)(3, 5)(4, 6)")
names(triples) <- triples
pairs <- c("(5, 6)", "(3, 5)")
names(pairs) <- pairs
So here we go:
test1 <- outer(X = triples, Y = pairs, FUN = version1)
test2 <- outer(X = triples, Y = pairs, FUN = version2)
test2
evaluates to exactly what you expect, but test1
gives a non-sensical output:
> test1
(5, 6) (3, 5)
(1, 2)(3, 4)(5, 6) TRUE FALSE
(1, 2)(3, 5)(4, 6) TRUE FALSE
> test2
(5, 6) (3, 5)
(1, 2)(3, 4)(5, 6) TRUE FALSE
(1, 2)(3, 5)(4, 6) FALSE TRUE
The natural conclusion is that there is an error in version1
, but it is not as simple as that. 'Manually' computing the terms in the matrix using version1
gives:
> version1(triples[1], pairs[1])
[1] TRUE
> version1(triples[1], pairs[2])
[1] FALSE
> version1(triples[2], pairs[1])
[1] FALSE
> version1(triples[2], pairs[2])
[1] TRUE
exactly as it should! So at least part of the fault is with the function outer
. In fact what happens (in this small example it is not so clear, but this is very visible in larger examples) is that outer
correctly computes the first row of its output matrix, but then copies this first row over and over to make up the subsequent rows. Obviously this is not what I want. If I only wanted to compute version1(x, y) for all y in some vector but just one single x, I would have used sapply
rather than outer
.
What is going on here?
CodePudding user response:
Note this detail from the documentation for ?outer
:
X
andY
must be suitable arguments forFUN
. Each will be extended by rep to length the products of the lengths ofX
andY
beforeFUN
is called.
FUN
is called with these two extended vectors as arguments (plus any arguments in...
). It must be a vectorized function (or the name of one) expecting at least two arguments and returning a value with the same length as the first (and the second).
Your version1
function is not vectorized properly like version2
is. You can see this by simply testing it on the original triples
and pairs
vectors, which should both match.
version1(triples, pairs)
#> [1] TRUE FALSE
version2(triples, pairs)
#> (5, 6) (3, 5)
#> TRUE TRUE
Your version1
function seems designed for use with apply()
, because you retrieve a list from strsplit()
but then just take the first element. If you want to maintain the approach of splitting the vector, then you would have to use the apply family of functions. Without using them, you are going to expand the triples
or x
vector into something much longer than y
and you can't do element wise comparison.
However, I would just use something very simple. stringr::str_detect
is already vectorized for string
and pattern
, so you can just use that directly.
library(stringr)
outer(X = triples, Y = pairs, FUN = str_detect)
#> (5, 6) (3, 5)
#> (1, 2)(3, 4)(5, 6) TRUE FALSE
#> (1, 2)(3, 5)(4, 6) FALSE TRUE