I am currently facing this problem while working on R.
I have a big matrix, say A, of size (M x K) and a smaller matrix, say B, of size (N x K), with of course N < M. So B has the same number of columns as A, but less rows.
A possible example can be:
A <- replicate(K, rnorm(M))
B <- replicate(K, sample(M, N))
My goal is to subset each column of A with the correspondent column of B in an efficent way, since the dimension of the matrices is huge.
A possible numerical example is:
A <- [1 2 3 B <- [1 1 3 ---> [1 2 9
4 5 6 3 2 2] 7 5 6]
7 8 9]
This means I am looking for something better than parsing column-by-column using a simple for loop:
for(i in 1:K)
{
A[B[,i],i]
}
Thank you for helping me to solve this apparently trivial, but for me challenging problem.
CodePudding user response:
See this question and ?'['
:
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
For your example, it would be:
A <- matrix(1:9, 3, byrow = TRUE)
B <- matrix(c(1, 1, 3, 3, 2, 2), 2, byrow = TRUE)
C <- matrix(A[cbind(c(B), rep(1:ncol(B), each = nrow(B)))], nrow(B))
UPDATE: For efficiency, Mossa is right. There's not going to be much of a difference in performance in this case between using a matrix of indices and looping. Almost all of your time is in generating A
. Generating B
even takes longer than generating C
:
M <- K <- 1e4; N <- 1e3
> system.time(A <- replicate(K, rnorm(M)))
user system elapsed
7.110 0.922 8.030
> system.time(B <- replicate(K, sample(M, N)))
user system elapsed
0.727 0.033 0.760
> system.time(C <- matrix(A[cbind(c(B), rep(1:ncol(B), each = nrow(B)))], nrow(B)))
user system elapsed
0.474 0.186 0.659
You can speed up the generation of A
slightly by sampling all in one go, but you don't have much room for improvement in any of the A
, B
, C
operations:
> system.time(A <- matrix(rnorm(M*K), M))
user system elapsed
6.82 0.74 7.56
CodePudding user response:
I believe that you've arrived at a pretty reasonable method:
C <- matrix(NA, nrow = N, ncol = K)
for (i in seq_len(K))
{
C[, i] <- A[B[, i], i]
}
C