Home > Mobile >  merge all matrices in R with different numbers of rows and columns
merge all matrices in R with different numbers of rows and columns

Time:11-24

I'm trying to merge several different matrices in R that all have unique column names but share some rows in common. However, the row numbers are also different, e.g. they do not share ALL the same row names. For example:

data1 <- matrix(seq(1,9), nrow = 3, ncol = 3)
rownames(data1) = c("gene1", "gene2", "gene3")
colnames(data1) = c("cell1", "cell2", "cell3")
data2 <- matrix(seq(1,12), nrow = 4, ncol = 3)
rownames(data2) = c("gene2", "gene3", "gene4", "gene5")
colnames(data2) = c("cell4", "cell5", "cell6", "cell7")

#       cell1   cell2   cell3
#gene1    1        4       7
#gene2    2        5       8
#gene3    3        6       9

#       cell4   cell5   cell6
#gene2    1        4       7
#gene3    2        5       8
#gene4    3        6       9
#gene5    4        8       12

Now in a situation like this you could use merge and set all to TRUE:

totMatrix = merge(data1, data2, all=T)

however, this causes duplicate row names, e.g. it adds new rows that have different columns even when the row names are the same. Also, merge gets rid of my row names. The behavior I require instead is for the columns to be added to the rows sharing the same name, in a way that I have all unique row names and unique column names. Like so:

#       cell1   cell2   cell3   cell4   cell5   cell6
#gene1    1        4       7     NA       NA      NA
#gene2    2        5       8     1         5      9
#gene3    3        6       9     2         6      10
#gene4    NA       NA      NA    3         7      11
#gene5    NA       NA      NA    4         8      12

Anyone know how this might be done?

CodePudding user response:

Use by="row.names"

data1 <- matrix(seq(1,9), nrow = 3, ncol = 3)
rownames(data1) = c("gene1", "gene2", "gene3")
colnames(data1) = c("cell1", "cell2", "cell3")
data2 <- matrix(seq(1,12), nrow = 4, ncol = 3)
rownames(data2) = c("gene2", "gene3", "gene4", "gene5")
colnames(data2) = c("cell4", "cell5", "cell6")

merge(data1, data2, by="row.names", all=T)

#>   Row.names cell1 cell2 cell3 cell4 cell5 cell6
#> 1     gene1     1     4     7    NA    NA    NA
#> 2     gene2     2     5     8     1     5     9
#> 3     gene3     3     6     9     2     6    10
#> 4     gene4    NA    NA    NA     3     7    11
#> 5     gene5    NA    NA    NA     4     8    12

Created on 2022-11-23 with reprex v2.0.2

  • Related