Calculate mean in matrix subgroup (sub-matrix) in R-CodePudding

I have a large matrix, in which every row corresponds to a sample, and samples belong to a population. For example, the row name s1-2 means population 1 - sample 2. I would like to calculate the mean for every population, such as in the illustration (unfortunately, I cannot create a sample):

Is this possible in R? May I kindly ask for guidance?

CodePudding user response：

It's not clear why you can't create a sample. Here's one for the purposes of exposition:

set.seed(1)
dimnames <- paste(rep(paste0('s', 1:3), each = 3), rep(1:3, 3), sep = '-')
m <- matrix(sample(0:5, 81, TRUE), 9, dimnames = list(dimnames, dimnames))

m
#>      s1-1 s1-2 s1-3 s2-1 s2-2 s2-3 s3-1 s3-2 s3-3
#> s1-1    0    2    4    5    5    3    4    3    3
#> s1-2    3    0    4    0    3    0    1    5    4
#> s1-3    0    4    0    3    3    5    5    2    4
#> s2-1    1    4    0    0    3    1    5    0    3
#> s2-2    4    1    5    3    1    2    5    3    5
#> s2-3    2    5    4    2    3    1    0    4    4
#> s3-1    5    5    4    5    0    5    2    0    3
#> s3-2    1    1    1    1    5    5    2    0    3
#> s3-3    2    0    1    1    0    1    5    5    0

To get the mean of each row / column group, then assuming we can identify the group by the first two characters of the row or column name (as in your example), we could do:

groups <- expand.grid(row = unique(substr(rownames(m), 1, 2)), 
                      col = unique(substr(colnames(m), 1, 2)))

m2 <- matrix(unlist(Map(function(r, c) {
  mean(m[grep(r, rownames(m)), grep(c, rownames(m))])
  }, r = groups$row, c = groups$col)), 3, 
  dimnames = list(unique(substr(rownames(m), 1, 2)), 
                  unique(substr(colnames(m), 1, 2))))

Resuting in

m2
#>          s1       s2       s3
#> s1 1.888889 3.000000 3.444444
#> s2 2.888889 1.777778 3.222222
#> s3 2.222222 2.555556 2.222222