Home > Enterprise >  Aggregating Rows of an Identity Matrix
Aggregating Rows of an Identity Matrix

Time:08-25

I have an [N x N] identity matrix and would like to aggregate the rows based on numbers of the column names. I hope that I can explain this easily with the following simple example.

Say I have the following identity matrix I:

I <- diag(16)
names <- paste(rep(c("aaa" , "bbb" , "ccc" , "ddd") , each = 4) , rep(c(1:4) , times = 4 ) , sep = "")
rownames(I) <- colnames(I) <- names

giving:

     aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
aaa1    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
aaa2    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0
aaa3    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
aaa4    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
bbb1    0    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0
bbb2    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0    0
bbb3    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
bbb4    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0
ccc1    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0
ccc2    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0    0
ccc3    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0
ccc4    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0
ddd1    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0
ddd2    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0    0
ddd3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0
ddd4    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1

My intention would be to, for example, aggregate all rows that include a 1 and a 2 in the column name. This would give:

       aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
aaa1 2    1    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0
aaa3      0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
aaa4      0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
bbb1 2    0    0    0    0    1    1    0    0    0    0    0    0    0    0    0    0
bbb3      0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
bbb4      0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0
ccc1 2    0    0    0    0    0    0    0    0    1    1    0    0    0    0    0    0
ccc3      0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0
ccc4      0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0
ddd1 2    0    0    0    0    0    0    0    0    0    0    0    0    1    1    0    0
ddd3      0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0
ddd4      0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1

In this case I added all names with a 1 and 2. Is there any way of doing this for any matrix with [n x n] dimensions? Of extreme help would also be that I could choose which rows to add up (not necessarily exclusively 1 and 2). Note: The column and row names of the matrix can also take values above 10. Note2: the three letter strings have changing letters (for example "ABC22"). Any help is greatly appreciated!

CodePudding user response:

You can create the groups of rows with cumsum (with equal numbers meaning collapsing rows), and then use aggregate:

s = cumsum(grepl("1|3|4", rownames(I)))
# [1]  1  1  2  3  4  4  5  6  7  7  8  9 10 10 11 12

aggregate(I, list(s), FUN=sum)[-1]
  aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
1     1    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0
2     0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
3     0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
4     0    0    0    0    1    1    0    0    0    0    0    0    0    0    0    0
5     0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
6     0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0
7     0    0    0    0    0    0    0    0    1    1    0    0    0    0    0    0
8     0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0
9     0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0
10    0    0    0    0    0    0    0    0    0    0    0    0    1    1    0    0
11    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0
12    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1

CodePudding user response:

A modification on Maël's that is a little more agnostic to the row names.

ind <- sub("(?<=\\D)2$", "1", rownames(quux), perl = TRUE)

From there, the use of aggregate produces a frame (unfortunate in my mind), which we can then use to recreate the properly-named matrix.

tmp <- aggregate(I, list(ind), FUN = sum)
rownames(tmp) <- tmp[[1]]
tmp <- as.matrix(tmp[,-1])

tmp
#      aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
# aaa1    1    1    0    0    0    0    0    0    0    0    0    0    0    0    0    0
# aaa3    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0    0
# aaa4    0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0
# bbb1    0    0    0    0    1    1    0    0    0    0    0    0    0    0    0    0
# bbb3    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0    0
# bbb4    0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0
# ccc1    0    0    0    0    0    0    0    0    1    1    0    0    0    0    0    0
# ccc3    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0    0
# ccc4    0    0    0    0    0    0    0    0    0    0    0    1    0    0    0    0
# ddd1    0    0    0    0    0    0    0    0    0    0    0    0    1    1    0    0
# ddd3    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0
# ddd4    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1

This solution is more agnostic to the numbers in the names, looking specifically for 2 only, not caring about the other numbers. It also has a secondary grouping by the letters, I'm inferring that that is intended.

  • Related