Creating matrix from 2 vectors with matrix values depending on vector elements-CodePudding

I have two vectors

seqX <- c("seq1","seq2","seq3")
seqY <- c("seqA","seqB")

and want to create a seqX x seqY matrix in which each cell contains a value calculated by a user function which relies on the values of each seqX x seqY combination of a particular cell. For example a user function like this

my.function.example <- function (seqX,seqY) {
   return(paste(seqX,"-",seqY))
}

should produce a result that looks like this:

     [,1]      [,2]      [,3]
[1,] seq1-seqA seq2-seqA seq3-seqA
[2,] seq1-seqB seq2-seqB seq3-seqB

I am lost with tapply, sapply, etc. thus I am asking for your help. My two questions are:

How can I create the matrix and apply such a function which relies on the two vectors to it?
How can I do that most efficient in case seqX <- seqY (which means that all combinations would occur twice and I would have to calculate them redundantly)

Your help is highly appreciated - thank you in advance!

Kind regards, Martin

CodePudding user response：

You con use outer.

t(outer(seqX, seqY, my.function.example))
#     [,1]          [,2]          [,3]         
#[1,] "seq1 - seqA" "seq2 - seqA" "seq3 - seqA"
#[2,] "seq1 - seqB" "seq2 - seqB" "seq3 - seqB"

t(outer(seqX, seqY, paste, sep="-"))
#     [,1]        [,2]        [,3]       
#[1,] "seq1-seqA" "seq2-seqA" "seq3-seqA"
#[2,] "seq1-seqB" "seq2-seqB" "seq3-seqB"

Benchmark

bench::mark(check=FALSE,
         GKi = t(outer(seqX, seqY, my.function.example)),
         Shree = sapply(seqX, function(x) my.function.example(x, seqY)), #Adapted
         Peter = with(expand.grid(a=seqX, b=seqY), matrix(my.function.example(a, b), length(seqY), byrow=TRUE)) #Adapted
         )

Result

  expression      min  median itr/s…¹ mem_a…² gc/se…³ n_itr  n_gc total…⁴ result
  <bch:expr> <bch:tm> <bch:t>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t> <list>
1 GKi          7.47µs  11.1µs  73248.      0B    51.3  9993     7   136ms <NULL>
2 Shree       22.09µs    30µs  28128.  4.95KB    62.0  9978    22   355ms <NULL>
3 Peter        46.1µs    76µs  11531.      0B    70.4  4750    29   412ms <NULL>

GKi is about 2.5 times faster than Shree and 6 times faster than Peter.

CodePudding user response：

seqX <- c("seq1","seq2","seq3")
seqY <- c("seqA","seqB")

xy <- expand.grid(x = seqX, y = seqY)

matrix(paste(xy$x, xy$y, sep = "-"), nrow = 2, byrow = TRUE)
#>      [,1]        [,2]        [,3]       
#> [1,] "seq1-seqA" "seq2-seqA" "seq3-seqA"
#> [2,] "seq1-seqB" "seq2-seqB" "seq3-seqB"


seqY <- seqX

xy <- expand.grid(x = seqX, y = seqY)

mx <- matrix(paste(xy$x, xy$y, sep = "-"), nrow = 3, byrow = TRUE)

mx[upper.tri(mx)] <- NA

mx

#>      [,1]        [,2]        [,3]       
#> [1,] "seq1-seq1" NA          NA         
#> [2,] "seq1-seq2" "seq2-seq2" NA         
#> [3,] "seq1-seq3" "seq2-seq3" "seq3-seq3"

^{Created on 2022-10-16 with reprex v2.0.2}

CodePudding user response：

Here's one way with sapply -

sapply(seqX, function(x) paste(x, "-", seqY))

     seq1          seq2          seq3         
[1,] "seq1 - seqA" "seq2 - seqA" "seq3 - seqA"
[2,] "seq1 - seqB" "seq2 - seqB" "seq3 - seqB"