outer reuses first element of X instead of doing its job-CodePudding

I have a two argument function that takes as its first input a triple of pairs of numbers in the form "(a, b)(c, d)(e, f)" (as a character string) and as second argument a pair of numbers (also written as a character string of the form "(a, b)") and outputs a logical that states if the pair (the second argument) is one of the three pairs in the triple (the first argument). I actually wrote two versions:

version1 <- function(x, y){#x is a triple of pairs, y is a pair
  pairsfromthistriple <- paste(c("", "(", "("), strsplit(x, split = ")(", fixed = T)[[1]], c(")", ")", ""), sep = "")
  y %in% pairsfromthistriple
}

version2 <- function(x, y){#x is triple of pairs, y is pair
  y == substr(x, 1, 6) | y == substr(x, 7, 12) | y == substr(x, 13, 18)
}

I want to set this function loose for every triple-of-pairs from a vector of triples an every pair from some vector of pairs using outer. For here I'll us the following very short vectors:

triples <- c("(1, 2)(3, 4)(5, 6)", "(1, 2)(3, 5)(4, 6)")
names(triples) <- triples
pairs <- c("(5, 6)", "(3, 5)")
names(pairs) <- pairs

So here we go:

test1 <- outer(X = triples, Y = pairs, FUN = version1)
test2 <- outer(X = triples, Y = pairs, FUN = version2)

test2 evaluates to exactly what you expect, but test1 gives a non-sensical output:

> test1
                   (5, 6) (3, 5)
(1, 2)(3, 4)(5, 6)   TRUE  FALSE
(1, 2)(3, 5)(4, 6)   TRUE  FALSE

> test2
                   (5, 6) (3, 5)
(1, 2)(3, 4)(5, 6)   TRUE  FALSE
(1, 2)(3, 5)(4, 6)  FALSE   TRUE

The natural conclusion is that there is an error in version1, but it is not as simple as that. 'Manually' computing the terms in the matrix using version1 gives:

> version1(triples[1], pairs[1])
[1] TRUE
> version1(triples[1], pairs[2])
[1] FALSE
> version1(triples[2], pairs[1])
[1] FALSE
> version1(triples[2], pairs[2])
[1] TRUE

exactly as it should! So at least part of the fault is with the function outer. In fact what happens (in this small example it is not so clear, but this is very visible in larger examples) is that outer correctly computes the first row of its output matrix, but then copies this first row over and over to make up the subsequent rows. Obviously this is not what I want. If I only wanted to compute version1(x, y) for all y in some vector but just one single x, I would have used sapply rather than outer.

What is going on here?

CodePudding user response：

Note this detail from the documentation for ?outer:

X and Y must be suitable arguments for FUN. Each will be extended by rep to length the products of the lengths of X and Y before FUN is called.

FUN is called with these two extended vectors as arguments (plus any arguments in ...). It must be a vectorized function (or the name of one) expecting at least two arguments and returning a value with the same length as the first (and the second).

Your version1 function is not vectorized properly like version2 is. You can see this by simply testing it on the original triples and pairs vectors, which should both match.

version1(triples, pairs)
#> [1]  TRUE FALSE
version2(triples, pairs)
#> (5, 6) (3, 5) 
#>   TRUE   TRUE

Your version1 function seems designed for use with apply(), because you retrieve a list from strsplit() but then just take the first element. If you want to maintain the approach of splitting the vector, then you would have to use the apply family of functions. Without using them, you are going to expand the triples or x vector into something much longer than y and you can't do element wise comparison.

However, I would just use something very simple. stringr::str_detect is already vectorized for string and pattern, so you can just use that directly.

library(stringr)

outer(X = triples, Y = pairs, FUN = str_detect)
#>                    (5, 6) (3, 5)
#> (1, 2)(3, 4)(5, 6)   TRUE  FALSE
#> (1, 2)(3, 5)(4, 6)  FALSE   TRUE