I've got a matrix of 68 columns and almost 43000 rows in R. It's basically a huge matrix comprised of smaller 68* 68 matrices. I need to get a mean matrix of every 15 smaller matrices (as each 15 matrices equals one participant). So rows 1-68, 69-136 etc up until 1020 (=15* 68). I can't figure out how to make a for loop where it takes every 68 rows and sums it with the next 68 etc while still maintaining a 68* 68 matrix. The only way I was able to sum them up correctly was by indexing the specific rows but as I have 43000 rows and this dataset is the first of 30 files I don't want to keep indexing.
Can anybody help me find an easy/fast way to do this?
EDIT: So an example of the data would be:
print(Matrix_Alpha_ami[1:3,1:5])
V1 V2 V3 V4 V5
[1,] 0.0000 0.4749 0.5629 0.6339 0.5406
[2,] 0.4749 0.0000 0.3157 0.5234 0.4737
[3,] 0.5629 0.3157 0.0000 0.5707 0.4191
> print(Matrix_Alpha_ami[69:71,1:5])
V1 V2 V3 V4 V5
[69,] 0.0000 0.4993 0.4812 0.5227 0.5018
[70,] 0.4993 0.0000 0.5444 0.6106 0.3324
[71,] 0.4812 0.5444 0.0000 0.5818 0.4107
The columns continue until V68 and the rows go down until 42k
The first bit of data is the beginning of matrix 1, the second bit of matrix 2. The problem is that they're not individual matrices but part of one big one. Because of this I can't just say m1*m2.
In the end I need a mean matrix of 15 matrices - getting an average of all measurements (n=15) of one participant. As an example, from the example data I would get ((m1 m2)/2):
V1 V2 V3 V4 V5
[1,] 0.00000 0.48710 0.52205 0.57830 0.52120
[2,] 0.48710 0.00000 0.43005 0.56700 0.40305
[3,] 0.52205 0.43005 0.00000 0.57625 0.41490
CodePudding user response:
I am not sure I understood correctly what you are trying to do, but I have created a vector containing the appropriate indexes for subsetting the rows of your matrix
number <- 1020
a <- (seq(1,number,68))
n <- as.numeric(length(a))
b <- vector()
for (i in (1:n)){
b[i] <- a[i 1]-1}
b[n] <- number
c <- paste(a,b, sep = ":")
c
[1] "1:68" "69:136" "137:204" "205:272" "273:340"
[6] "341:408" "409:476" "477:544" "545:612" "613:680"
[11] "681:748" "749:816" "817:884" "885:952" "953:1020"
CodePudding user response:
If I understand the question correctly, the following function might answer it.
The function splits the input matrix by sub-matrices corresponding to participants, then computes the column means by sub-matrices within each participant.
The return value is a list of mean values matrices.
funMean <- function(x, rows, matrices){
f <- c(1, rep(0, rows*matrices - 1L))
f <- rep(f, length.out = nrow(x))
f <- cumsum(f)
#
g <- c(1, rep(0, rows - 1L))
g <- rep(g, matrices)
g <- cumsum(g)
#
x <- split(x, f)
x <- lapply(x, matrix, ncol = rows)
y <- lapply(x, \(X){
z <- split(X, g)
z <- lapply(z, matrix, ncol = rows)
t(sapply(z, rowMeans, na.rm = TRUE))
})
y
}
With fake data, instead of 68*68 there are Rows*Rows
and Mats
instead of 15.
Rows <- 5
Mats <- 3
y <- funMean(Matrix_Alpha_ami, Rows, Mats)
y[[1]]
# [,1] [,2] [,3] [,4] [,5]
#1 -0.9407742 0.40359467 0.233171598 -0.004998849 0.54604432
#2 -1.1864782 -0.58013231 0.004050014 0.102806769 -0.28126799
#3 -0.2209807 0.08324887 -0.417034539 0.695183587 0.02515484
do.call(rbind, y)
Test data
set.seed(2021)
Matrix_Alpha_ami <- lapply(1:10, \(k){
matrix(rnorm(Mats*Rows^2), ncol = Rows)
})
Matrix_Alpha_ami <- do.call(rbind, Matrix_Alpha_ami)
dim(Matrix_Alpha_ami)
CodePudding user response:
This solution uses a series of indexing to build a 15-row matrix to pass to colMeans
. It should run very fast.
d <- 68L
s <- 15L
# traceable example matrix--replace m with your matrix
m <- matrix(1:d, nrow = d*s*10, ncol = d)
m <- m (col(m) - 1)*d ((row(m) - 1) %/% (d*s))*d^2
# solution
i1 <- seq(1L, by = d*s, length.out = nrow(m)/d/s) # top left index of the first d-by-d matrix for each individual
i2 <- sequence(rep(d, length(i1)), i1) # indices of left-most column of the first d-by-d matrix for each individual
i3 <- sequence(rep(d, length(i2)), i2, nrow(m)) # indices of the first d-by-d matrix for each individual
i4 <- sequence(rep(s, length(i3)), i3, d) # indices for averaging (by sets of s)
m2 <- matrix(colMeans(matrix(m[i4], nrow = s)), ncol = d, byrow = TRUE)