Home > database >  Quickest way to do an operation on every two (different) columns of a matrix in R?
Quickest way to do an operation on every two (different) columns of a matrix in R?

Time:12-17

Suppose we have a n-by-4 matrix, and we wish to find the average of every 2 differing columns of this matrix. In a combinatorics language, we essentially want to find all different ways that we can choose 2 columns out of 4 and find each of their averages (or any other operation). We know that this suggests 6 different combinations: (1,2), (1,3), (1,4), (2,3), (2,4), (3,4),
where we would then proceed to conduct the operation of interest on each of the 6 pairs. How can this exercise be extended to a general n-by-m matrix in R?

Thanks.

CodePudding user response:

This is likely not the quickest way (as requested in the title), but this approach is clear and flexible.

Here I assume m=4 and the calculation of interest is the sum of the means of the two columns:

# create example data (n-by-m matrix of values)
n <- 200
m <- 4
mat <- matrix(runif(n*m), nrow=n, ncol=m)

# get all column pairs
pairs <- t(combn(m, 2))
P <- nrow(pairs)

# allocate an "empty" vector to hold the results
result <- vector(length=P)

# loop over column pairs
for(p in 1:P) {
    i <- pairs[p,1]
    j <- pairs[p,2]
    result[p] <- mean(mat[,i])   mean(mat[,j])
}

# view result
cbind(pairs, result)

CodePudding user response:

DanY solution is spot on. I would suggest that for a really large dataset an apply function might be faster. DanY solution but with an apply() function:

# create example data (n-by-m matrix of values)
n <- 200
m <- 4
mat <- matrix(runif(n*m), nrow=n, ncol=m)

# get all column pairs
pairs <- t(combn(m, 2))
P <- nrow(pairs)

# allocate an "empty" vector to hold the results
result <- vector(length=P)

#apply instead
result <- apply(pairs, 1, function(x) {
  mean(mat[, x[1]])   mean(mat[, x[2]])
})

# view result
cbind(pairs, result)
  • Related