I need to select a column from a list, cbind
the columns, and perform a function on rows of such a combined dataset. I need to do this consequently for all columns. Inspired by the answer here, I came up with a possible solution for one column:
x <- apply(Reduce("cbind", lapply(L, FUN = function(x) x[, 1])), 1, FUN = sd)
It is clunky and gets worse when expanded to include all columns. Let's have a list of matrices:
set.seed(2385737)
L = list(matrix(rnorm(30), ncol = 3), matrix(rnorm(30), ncol = 3), matrix(rnorm(30), ncol = 3))
X <- matrix(c(apply(Reduce("cbind", lapply(L, FUN = function(x) x[, 1])), 1, FUN = sd),
apply(Reduce("cbind", lapply(L, FUN = function(x) x[, 2])), 1, FUN = sd),
apply(Reduce("cbind", lapply(L, FUN = function(x) x[, 3])), 1, FUN = sd)),
ncol = 3
)
I can generalise the code above into:
X <- sapply(1:ncol(L[[1]]),
FUN = function(i) apply(Reduce("cbind",
lapply(L, FUN = function(x) x[, i])), 1, FUN = sd))
Is there a clean way how to approach the calculation consequently for all columns across a list?
CodePudding user response:
One option would be to stack the list of matrices into a single 3D array and perform the calculations directly on this array using apply
. Although rarely used, the MARGIN
argument in apply
can be fed a vector of margin indices that allows calculations to be done on any dimension of the array, so using MARGIN = c(1, 2)
will perform the FUN
on the vectors along the third dimension.
This allows the whole thing to be done as a one-liner if you use the function abind
from the abind
package to create the array from your list.
apply(do.call(abind::abind, c(L, along = 3)), c(1, 2), FUN = sd)
#> [,1] [,2] [,3]
#> [1,] 0.5040136 0.1593154 0.9371359
#> [2,] 1.2781308 0.5380104 1.1967232
#> [3,] 1.3355753 0.5445188 0.8851976
#> [4,] 1.5333570 0.9800276 0.5928828
#> [5,] 1.4844418 2.1511425 1.6904784
#> [6,] 1.5158726 2.0156800 1.3566559
#> [7,] 0.8452233 0.3058013 1.0896865
#> [8,] 0.5742021 0.8816770 1.4622064
#> [9,] 1.7673249 0.9863849 1.1386831
#> [10,] 0.9001773 1.0793596 0.5754467
This is the same result as X
in your example above.
If you prefer to use base R without extra packages, you can create your array directly:
apply(array(unlist(L), c(nrow(L[[1]]), ncol(L[[1]]), length(L))), c(1, 2), sd)
This gives the same result.
CodePudding user response:
Benchmarking:
set.seed(1); L = lapply(1:100, function(i) matrix(rnorm(1000000), ncol = 1000))
microbenchmark::microbenchmark(
zx = {
sapply(
#transpose
lapply(seq(ncol(L[[1]])), function(i)
sapply(seq_along(L), function(j)
L[[ j ]][, i ]
)),
#apply function
function(i) apply(i, 1, sd))
},
sotos = {
i1 <- seq(1, ncol(L[[1]]) * length(L), (ncol(L[[1]]) * length(L))/length(L))
sapply(seq(0, (length(L)-1)), \(i)apply(do.call(cbind, L)[,i1 i], 1, sd))
},
allanAbind = {
apply(do.call(abind::abind, c(L, along = 3)), c(1, 2), FUN = sd)
},
allanBase = {
apply(array(unlist(L), c(nrow(L[[1]]), ncol(L[[1]]), length(L))), c(1, 2), sd)
},
times = 10
)
#Unit: seconds
# expr min lq mean median uq max neval
# zx 19.66990 21.16743 24.33954 23.99107 27.29293 30.42287 10
# sotos 43.45282 45.93170 48.76219 48.05993 51.02062 55.16740 10
# allanAbind 19.81033 21.69177 25.47289 23.96392 29.19223 35.03646 10
# allanBase 21.69405 22.26512 26.29049 23.90017 28.06595 39.60385 10
CodePudding user response:
"Transpose" the list so that we have 3 (number of columns in a matrix) lists where each list contains nth column from original list L, i.e.: first one will have all first columns from all matrices.
Then loop through that list, and apply function per row:
sapply(
#transpose
lapply(seq(ncol(L[[1]])), function(i)
sapply(seq_along(L), function(j)
L[[ j ]][, i ]
)),
#apply function
function(i) apply(i, 1, sd))
# [,1] [,2] [,3]
# [1,] 0.5040136 0.1593154 0.9371359
# [2,] 1.2781308 0.5380104 1.1967232
# [3,] 1.3355753 0.5445188 0.8851976
# [4,] 1.5333570 0.9800276 0.5928828
# [5,] 1.4844418 2.1511425 1.6904784
# [6,] 1.5158726 2.0156800 1.3566559
# [7,] 0.8452233 0.3058013 1.0896865
# [8,] 0.5742021 0.8816770 1.4622064
# [9,] 1.7673249 0.9863849 1.1386831
# [10,] 0.9001773 1.0793596 0.5754467
CodePudding user response:
Another way to go at it would be to create a sequence with the columns to be binded (i.e. in your case 1, 4, 7 - 2, 5, 8 - 3, 6, 9) and apply the sd
function rowwise to each of the combinations, i.e.
i1 <- seq(1, ncol(L[[1]])* length(L), (ncol(L[[1]])*length(L))/length(L))
sapply(seq(0, (length(L)-1)), \(i)apply(do.call(cbind, L)[,i1 i], 1, sd))
[,1] [,2] [,3]
[1,] 0.5040136 0.1593154 0.9371359
[2,] 1.2781308 0.5380104 1.1967232
[3,] 1.3355753 0.5445188 0.8851976
[4,] 1.5333570 0.9800276 0.5928828
[5,] 1.4844418 2.1511425 1.6904784
[6,] 1.5158726 2.0156800 1.3566559
[7,] 0.8452233 0.3058013 1.0896865
[8,] 0.5742021 0.8816770 1.4622064
[9,] 1.7673249 0.9863849 1.1386831
[10,] 0.9001773 1.0793596 0.5754467