Suppose we have a n-by-4 matrix, and we wish to find the average of every 2 differing columns of this matrix. In a combinatorics language, we essentially want to find all different ways that we can choose 2 columns out of 4 and find each of their averages (or any other operation). We know that this suggests 6 different combinations: (1,2), (1,3), (1,4), (2,3), (2,4), (3,4),
where we would then proceed to conduct the operation of interest on each of the 6 pairs. How can this exercise be extended to a general n-by-m matrix in R?
Thanks.
CodePudding user response:
This is likely not the quickest way (as requested in the title), but this approach is clear and flexible.
Here I assume m=4
and the calculation of interest is the sum of the means of the two columns:
# create example data (n-by-m matrix of values)
n <- 200
m <- 4
mat <- matrix(runif(n*m), nrow=n, ncol=m)
# get all column pairs
pairs <- t(combn(m, 2))
P <- nrow(pairs)
# allocate an "empty" vector to hold the results
result <- vector(length=P)
# loop over column pairs
for(p in 1:P) {
i <- pairs[p,1]
j <- pairs[p,2]
result[p] <- mean(mat[,i]) mean(mat[,j])
}
# view result
cbind(pairs, result)
CodePudding user response:
DanY solution is spot on. I would suggest that for a really large dataset an apply function might be faster. DanY solution but with an apply() function:
# create example data (n-by-m matrix of values)
n <- 200
m <- 4
mat <- matrix(runif(n*m), nrow=n, ncol=m)
# get all column pairs
pairs <- t(combn(m, 2))
P <- nrow(pairs)
# allocate an "empty" vector to hold the results
result <- vector(length=P)
#apply instead
result <- apply(pairs, 1, function(x) {
mean(mat[, x[1]]) mean(mat[, x[2]])
})
# view result
cbind(pairs, result)