I have an [N x N] identity matrix and would like to aggregate the rows based on numbers of the column names. I hope that I can explain this easily with the following simple example.
Say I have the following identity matrix I:
I <- diag(16)
names <- paste(rep(c("aaa" , "bbb" , "ccc" , "ddd") , each = 4) , rep(c(1:4) , times = 4 ) , sep = "")
rownames(I) <- colnames(I) <- names
giving:
aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
aaa1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aaa2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aaa3 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
aaa4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
bbb1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
bbb2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
bbb3 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
bbb4 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
ccc1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
ccc2 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
ccc3 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
ccc4 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
ddd1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
ddd2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
ddd3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
ddd4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
My intention would be to, for example, aggregate all rows that include a 1 and a 2 in the column name. This would give:
aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
aaa1 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aaa3 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
aaa4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
bbb1 2 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
bbb3 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
bbb4 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
ccc1 2 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
ccc3 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
ccc4 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
ddd1 2 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
ddd3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
ddd4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
In this case I added all names with a 1 and 2. Is there any way of doing this for any matrix with [n x n] dimensions? Of extreme help would also be that I could choose which rows to add up (not necessarily exclusively 1 and 2). Note: The column and row names of the matrix can also take values above 10. Note2: the three letter strings have changing letters (for example "ABC22"). Any help is greatly appreciated!
CodePudding user response:
You can create the groups of rows with cumsum
(with equal numbers meaning collapsing rows), and then use aggregate
:
s = cumsum(grepl("1|3|4", rownames(I)))
# [1] 1 1 2 3 4 4 5 6 7 7 8 9 10 10 11 12
aggregate(I, list(s), FUN=sum)[-1]
aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
CodePudding user response:
A modification on Maël's that is a little more agnostic to the row names.
ind <- sub("(?<=\\D)2$", "1", rownames(quux), perl = TRUE)
From there, the use of aggregate
produces a frame (unfortunate in my mind), which we can then use to recreate the properly-named matrix.
tmp <- aggregate(I, list(ind), FUN = sum)
rownames(tmp) <- tmp[[1]]
tmp <- as.matrix(tmp[,-1])
tmp
# aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4 ddd1 ddd2 ddd3 ddd4
# aaa1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# aaa3 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
# aaa4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
# bbb1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
# bbb3 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
# bbb4 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
# ccc1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
# ccc3 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
# ccc4 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
# ddd1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
# ddd3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
# ddd4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
This solution is more agnostic to the numbers in the names, looking specifically for 2
only, not caring about the other numbers. It also has a secondary grouping by the letters, I'm inferring that that is intended.