Home > Software engineering >  How can I build a for-loop that sums n rows from a super matrix and results in a cumulative matrix?
How can I build a for-loop that sums n rows from a super matrix and results in a cumulative matrix?

Time:12-11

I've got a matrix of 68 columns and almost 43000 rows in R. It's basically a huge matrix comprised of smaller 68* 68 matrices. I need to get a mean matrix of every 15 smaller matrices (as each 15 matrices equals one participant). So rows 1-68, 69-136 etc up until 1020 (=15* 68). I can't figure out how to make a for loop where it takes every 68 rows and sums it with the next 68 etc while still maintaining a 68* 68 matrix. The only way I was able to sum them up correctly was by indexing the specific rows but as I have 43000 rows and this dataset is the first of 30 files I don't want to keep indexing.

Can anybody help me find an easy/fast way to do this?

EDIT: So an example of the data would be:

print(Matrix_Alpha_ami[1:3,1:5])
V1     V2     V3     V4     V5
[1,] 0.0000 0.4749 0.5629 0.6339 0.5406
[2,] 0.4749 0.0000 0.3157 0.5234 0.4737
[3,] 0.5629 0.3157 0.0000 0.5707 0.4191

> print(Matrix_Alpha_ami[69:71,1:5])
         V1     V2     V3     V4     V5
[69,] 0.0000 0.4993 0.4812 0.5227 0.5018
[70,] 0.4993 0.0000 0.5444 0.6106 0.3324
[71,] 0.4812 0.5444 0.0000 0.5818 0.4107

The columns continue until V68 and the rows go down until 42k

The first bit of data is the beginning of matrix 1, the second bit of matrix 2. The problem is that they're not individual matrices but part of one big one. Because of this I can't just say m1*m2.

In the end I need a mean matrix of 15 matrices - getting an average of all measurements (n=15) of one participant. As an example, from the example data I would get ((m1 m2)/2):

          V1      V2      V3      V4      V5
[1,] 0.00000 0.48710 0.52205 0.57830 0.52120
[2,] 0.48710 0.00000 0.43005 0.56700 0.40305
[3,] 0.52205 0.43005 0.00000 0.57625 0.41490

CodePudding user response:

I am not sure I understood correctly what you are trying to do, but I have created a vector containing the appropriate indexes for subsetting the rows of your matrix

number <- 1020

a <- (seq(1,number,68))
n <- as.numeric(length(a))
b <- vector()

for (i in (1:n)){
  b[i] <- a[i 1]-1}

b[n] <- number

c <- paste(a,b, sep = ":")

c

[1] "1:68"     "69:136"   "137:204"  "205:272"  "273:340" 
[6] "341:408"  "409:476"  "477:544"  "545:612"  "613:680" 
[11] "681:748"  "749:816"  "817:884"  "885:952"  "953:1020"

CodePudding user response:

If I understand the question correctly, the following function might answer it.
The function splits the input matrix by sub-matrices corresponding to participants, then computes the column means by sub-matrices within each participant.
The return value is a list of mean values matrices.

funMean <- function(x, rows, matrices){
  f <- c(1, rep(0, rows*matrices - 1L))
  f <- rep(f, length.out = nrow(x))
  f <- cumsum(f)
  #
  g <- c(1, rep(0, rows - 1L))
  g <- rep(g, matrices)
  g <- cumsum(g)
  #
  x <- split(x, f)
  x <- lapply(x, matrix, ncol = rows)
  y <- lapply(x, \(X){
    z <- split(X, g)
    z <- lapply(z, matrix, ncol = rows)
    t(sapply(z, rowMeans, na.rm = TRUE))
  })
  y
}

With fake data, instead of 68*68 there are Rows*Rows and Mats instead of 15.

Rows <- 5
Mats <- 3

y <- funMean(Matrix_Alpha_ami, Rows, Mats)
y[[1]]
#        [,1]        [,2]         [,3]         [,4]        [,5]
#1 -0.9407742  0.40359467  0.233171598 -0.004998849  0.54604432
#2 -1.1864782 -0.58013231  0.004050014  0.102806769 -0.28126799
#3 -0.2209807  0.08324887 -0.417034539  0.695183587  0.02515484


do.call(rbind, y)

Test data

set.seed(2021)
Matrix_Alpha_ami <- lapply(1:10, \(k){
  matrix(rnorm(Mats*Rows^2), ncol =  Rows)
})
Matrix_Alpha_ami <- do.call(rbind, Matrix_Alpha_ami)
dim(Matrix_Alpha_ami)

CodePudding user response:

This solution uses a series of indexing to build a 15-row matrix to pass to colMeans. It should run very fast.

d <- 68L
s <- 15L
# traceable example matrix--replace m with your matrix
m <- matrix(1:d, nrow = d*s*10, ncol = d)
m <- m   (col(m) - 1)*d   ((row(m) - 1) %/% (d*s))*d^2

# solution
i1 <- seq(1L, by = d*s, length.out = nrow(m)/d/s) # top left index of the first d-by-d matrix for each individual
i2 <- sequence(rep(d, length(i1)), i1) # indices of left-most column of the first d-by-d matrix for each individual
i3 <- sequence(rep(d, length(i2)), i2, nrow(m)) # indices of the first d-by-d matrix for each individual
i4 <- sequence(rep(s, length(i3)), i3, d) # indices for averaging (by sets of s)
m2 <- matrix(colMeans(matrix(m[i4], nrow = s)), ncol = d, byrow = TRUE)
  • Related