Home > OS >  Is there a way in R to combine matrices of different sizes by rows AND columns?
Is there a way in R to combine matrices of different sizes by rows AND columns?

Time:05-08

I have a list of matrices in R. Each of the matrices has row and column names - sometimes overlapping with other row and column names in other matrices in the list. For example

Mat1 <- as.matrix( read.table(text="Col1 Col2 Col3
Row1     0  0   0
Row2     1  0   5
Row3     5  2   0", head=TRUE))

Mat2<- as.matrix( read.table(text="Col1 Col3 Col4
Row2     0  0   0
Row3     1  0   5
Row4     5  2   0",head=TRUE))

How can I combine all the matrices in the list such that (1) where the rows & columns intersect, the numbers are added together? (2) where the rows & columns do not intersect, the value from the original matrix is preserved?

All the examples I've found online (e.g. using the 'merge' function) are focused on only column merging or only row merging.

CodePudding user response:

##
#   this is a minimal reproducible example
#   ###    YOU should provide this     ###
#
m1 <- matrix(c(0,0,0,1,0,5,5,2,0), nc=3, byrow = TRUE, 
             dimnames = list(c('row.1', 'row.2', 'row.3'), c('col.1', 'col.2', 'col.3')))
m2 <- matrix(c(0,0,0,1,0,5,5,2,0), nc=3, byrow = TRUE, 
             dimnames = list(c('row.2', 'row.3', 'row.4'), c('col.1', 'col.3', 'col.4')))
##
#   you start here
#
library(data.table)
m  <- rbind(melt(as.data.table(m1, keep.rownames = T), id='rn'),
            melt(as.data.table(m2, keep.rownames = T), id='rn'))
m[is.na(value), value:=0]
dcast(m, rn~variable, fun.aggregate = sum)
##       rn col.1 col.2 col.3 col.4
## 1: row.1     0     0     0     0
## 2: row.2     1     0     5     0
## 3: row.3     6     2     0     5
## 4: row.4     5     0     2     0

CodePudding user response:

The question states that the input is a list of matrices so assume that that list is L shown below where Mat1 and Mat2 are shown in the question. Then convert each matrix to a long form data frame whose columns are the row names, column names and the value column. These columns are named Var1, Var2 and Freq. Then rbind the individual data frames together and use tapply to sum the elements converting it back to a matrix at the same time.

The question did not specify how to deal with cells that are not in any matrix in the list so we have used 0 but we could just omit default=0 if NA is desired. We have nulled out the dimension names (Var1 and Var2) but if retaining them is ok then omit that line.

No packages are used.

L <- list(Mat1, Mat2)

long <- do.call("rbind", lapply(L, as.data.frame.table)) 
m <- tapply(long[[3]], long[-3], sum, default = 0)
names(dimnames(m)) <- NULL  # optional

m

giving:

     Col1 Col2 Col3 Col4
Row1    0    0    0    0
Row2    1    0    5    0
Row3    6    2    0    5
Row4    5    0    2    0

CodePudding user response:

@jlhoward's answer shows you how to do it with the data.table package. Here's a way using base functions.

Mat1 <- as.matrix( read.table(text="Col1 Col2 Col3
Row1     0  0   0
Row2     1  0   5
Row3     5  2   0", head=TRUE))

Mat2<- as.matrix( read.table(text="Col1 Col3 Col4
Row2     0  0   0
Row3     1  0   5
Row4     5  2   0",head=TRUE))

# Get the row and column names 

rn1 <- rownames(Mat1)
rn2 <- rownames(Mat2)
cn1 <- colnames(Mat1)
cn2 <- colnames(Mat2)

# Construct row and column names for the sum matrix
rnsum <- unique(c(rn1, rn2))
cnsum <- unique(c(cn1, cn2))

# Make the matrix of zeros
sum <- matrix(0, length(rnsum), length(cnsum),
              dimnames = list(rnsum, cnsum))

# Put all indices of each matrix into a matrix
# with column 1 being the row name, column 2 being the 
# column name, and add the results into the sum

ind <- cbind(rn1[row(Mat1)], cn1[col(Mat1)])
sum[ind] <- sum[ind]   Mat1[ind]

ind <- cbind(rn2[row(Mat2)], cn2[col(Mat2)])
sum[ind] <- sum[ind]   Mat2[ind]

sum
#>      Col1 Col2 Col3 Col4
#> Row1    0    0    0    0
#> Row2    1    0    5    0
#> Row3    6    2    0    5
#> Row4    5    0    2    0

Created on 2022-05-08 by the reprex package (v2.0.1)

If the matrices were actually in a list (e.g. thelist <- list(Mat1, Mat2)), then I'd just put all of this code into a loop, e.g.

sum <- matrix(0, 0, 0)
for (i in seq_along(thelist)) {
   Mat1 <- sum
   Mat2 <- thelist[[i]]
   
   ... same code as above ...
}
  • Related