I have two vectors
seqX <- c("seq1","seq2","seq3")
seqY <- c("seqA","seqB")
and want to create a seqX x seqY matrix in which each cell contains a value calculated by a user function which relies on the values of each seqX x seqY combination of a particular cell. For example a user function like this
my.function.example <- function (seqX,seqY) {
return(paste(seqX,"-",seqY))
}
should produce a result that looks like this:
[,1] [,2] [,3]
[1,] seq1-seqA seq2-seqA seq3-seqA
[2,] seq1-seqB seq2-seqB seq3-seqB
I am lost with tapply, sapply, etc. thus I am asking for your help. My two questions are:
- How can I create the matrix and apply such a function which relies on the two vectors to it?
- How can I do that most efficient in case seqX <- seqY (which means that all combinations would occur twice and I would have to calculate them redundantly)
Your help is highly appreciated - thank you in advance!
Kind regards, Martin
CodePudding user response:
You con use outer
.
t(outer(seqX, seqY, my.function.example))
# [,1] [,2] [,3]
#[1,] "seq1 - seqA" "seq2 - seqA" "seq3 - seqA"
#[2,] "seq1 - seqB" "seq2 - seqB" "seq3 - seqB"
t(outer(seqX, seqY, paste, sep="-"))
# [,1] [,2] [,3]
#[1,] "seq1-seqA" "seq2-seqA" "seq3-seqA"
#[2,] "seq1-seqB" "seq2-seqB" "seq3-seqB"
Benchmark
bench::mark(check=FALSE,
GKi = t(outer(seqX, seqY, my.function.example)),
Shree = sapply(seqX, function(x) my.function.example(x, seqY)), #Adapted
Peter = with(expand.grid(a=seqX, b=seqY), matrix(my.function.example(a, b), length(seqY), byrow=TRUE)) #Adapted
)
Result
expression min median itr/s…¹ mem_a…² gc/se…³ n_itr n_gc total…⁴ result
<bch:expr> <bch:tm> <bch:t> <dbl> <bch:b> <dbl> <int> <dbl> <bch:t> <list>
1 GKi 7.47µs 11.1µs 73248. 0B 51.3 9993 7 136ms <NULL>
2 Shree 22.09µs 30µs 28128. 4.95KB 62.0 9978 22 355ms <NULL>
3 Peter 46.1µs 76µs 11531. 0B 70.4 4750 29 412ms <NULL>
GKi is about 2.5 times faster than Shree and 6 times faster than Peter.
CodePudding user response:
seqX <- c("seq1","seq2","seq3")
seqY <- c("seqA","seqB")
xy <- expand.grid(x = seqX, y = seqY)
matrix(paste(xy$x, xy$y, sep = "-"), nrow = 2, byrow = TRUE)
#> [,1] [,2] [,3]
#> [1,] "seq1-seqA" "seq2-seqA" "seq3-seqA"
#> [2,] "seq1-seqB" "seq2-seqB" "seq3-seqB"
seqY <- seqX
xy <- expand.grid(x = seqX, y = seqY)
mx <- matrix(paste(xy$x, xy$y, sep = "-"), nrow = 3, byrow = TRUE)
mx[upper.tri(mx)] <- NA
mx
#> [,1] [,2] [,3]
#> [1,] "seq1-seq1" NA NA
#> [2,] "seq1-seq2" "seq2-seq2" NA
#> [3,] "seq1-seq3" "seq2-seq3" "seq3-seq3"
Created on 2022-10-16 with reprex v2.0.2
CodePudding user response:
Here's one way with sapply
-
sapply(seqX, function(x) paste(x, "-", seqY))
seq1 seq2 seq3
[1,] "seq1 - seqA" "seq2 - seqA" "seq3 - seqA"
[2,] "seq1 - seqB" "seq2 - seqB" "seq3 - seqB"