Aggregating Rows of an Identity Matrix-CodePudding

I have an [N x N] identity matrix and would like to aggregate the rows based on numbers of the column names. I hope that I can explain this easily with the following simple example.

Say I have the following identity matrix I:

I <- diag(16)
names <- paste(rep(c("aaa" , "bbb" , "ccc" , "ddd") , each = 4) , rep(c(1:4) , times = 4 ) , sep = "")
rownames(I) <- colnames(I) <- names

giving:

     aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
aaa1    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
aaa2    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0
aaa3    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
aaa4    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
bbb1    0    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0
bbb2    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0    0
bbb3    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
bbb4    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0
ccc1    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0
ccc2    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0
ccc3    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0
ccc4    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0
ddd1    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0
ddd2    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0
ddd3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0
ddd4    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1

My intention would be to, for example, aggregate all rows that include a 1 and a 2 in the column name. This would give:

       aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
aaa1 2    1    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0
aaa3      0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
aaa4      0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
bbb1 2    0    0    0    0    1    1    0    0    0    0    0    0    0    0    0    0
bbb3      0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
bbb4      0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0
ccc1 2    0    0    0    0    0    0    0    0    1    1    0    0    0    0    0    0
ccc3      0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0
ccc4      0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0
ddd1 2    0    0    0    0    0    0    0    0    0    0    0    0    1    1    0    0
ddd3      0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0
ddd4      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1

In this case I added all names with a 1 and 2. Is there any way of doing this for any matrix with [n x n] dimensions? Of extreme help would also be that I could choose which rows to add up (not necessarily exclusively 1 and 2). Note: The column and row names of the matrix can also take values above 10. Note2: the three letter strings have changing letters (for example "ABC22"). Any help is greatly appreciated!

CodePudding user response：

You can create the groups of rows with cumsum (with equal numbers meaning collapsing rows), and then use aggregate:

s = cumsum(grepl("1|3|4", rownames(I)))
# [1]  1  1  2  3  4  4  5  6  7  7  8  9 10 10 11 12

aggregate(I, list(s), FUN=sum)[-1]

  aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
1     1    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0
2     0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
3     0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
4     0    0    0    0    1    1    0    0    0    0    0    0    0    0    0    0
5     0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
6     0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0
7     0    0    0    0    0    0    0    0    1    1    0    0    0    0    0    0
8     0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0
9     0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0
10    0    0    0    0    0    0    0    0    0    0    0    0    1    1    0    0
11    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0
12    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1

CodePudding user response：

A modification on Maël's that is a little more agnostic to the row names.

ind <- sub("(?<=\\D)2$", "1", rownames(quux), perl = TRUE)

From there, the use of aggregate produces a frame (unfortunate in my mind), which we can then use to recreate the properly-named matrix.

tmp <- aggregate(I, list(ind), FUN = sum)
rownames(tmp) <- tmp[[1]]
tmp <- as.matrix(tmp[,-1])

tmp
#      aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
# aaa1    1    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0
# aaa3    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
# aaa4    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
# bbb1    0    0    0    0    1    1    0    0    0    0    0    0    0    0    0    0
# bbb3    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
# bbb4    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0
# ccc1    0    0    0    0    0    0    0    0    1    1    0    0    0    0    0    0
# ccc3    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0
# ccc4    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0
# ddd1    0    0    0    0    0    0    0    0    0    0    0    0    1    1    0    0
# ddd3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0
# ddd4    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1

This solution is more agnostic to the numbers in the names, looking specifically for 2 only, not caring about the other numbers. It also has a secondary grouping by the letters, I'm inferring that that is intended.