I'm trying to merge several different matrices in R that all have unique column names but share some rows in common. However, the row numbers are also different, e.g. they do not share ALL the same row names. For example:
data1 <- matrix(seq(1,9), nrow = 3, ncol = 3)
rownames(data1) = c("gene1", "gene2", "gene3")
colnames(data1) = c("cell1", "cell2", "cell3")
data2 <- matrix(seq(1,12), nrow = 4, ncol = 3)
rownames(data2) = c("gene2", "gene3", "gene4", "gene5")
colnames(data2) = c("cell4", "cell5", "cell6", "cell7")
# cell1 cell2 cell3
#gene1 1 4 7
#gene2 2 5 8
#gene3 3 6 9
# cell4 cell5 cell6
#gene2 1 4 7
#gene3 2 5 8
#gene4 3 6 9
#gene5 4 8 12
Now in a situation like this you could use merge and set all to TRUE:
totMatrix = merge(data1, data2, all=T)
however, this causes duplicate row names, e.g. it adds new rows that have different columns even when the row names are the same. Also, merge gets rid of my row names. The behavior I require instead is for the columns to be added to the rows sharing the same name, in a way that I have all unique row names and unique column names. Like so:
# cell1 cell2 cell3 cell4 cell5 cell6
#gene1 1 4 7 NA NA NA
#gene2 2 5 8 1 5 9
#gene3 3 6 9 2 6 10
#gene4 NA NA NA 3 7 11
#gene5 NA NA NA 4 8 12
Anyone know how this might be done?
CodePudding user response:
Use by="row.names"
data1 <- matrix(seq(1,9), nrow = 3, ncol = 3)
rownames(data1) = c("gene1", "gene2", "gene3")
colnames(data1) = c("cell1", "cell2", "cell3")
data2 <- matrix(seq(1,12), nrow = 4, ncol = 3)
rownames(data2) = c("gene2", "gene3", "gene4", "gene5")
colnames(data2) = c("cell4", "cell5", "cell6")
merge(data1, data2, by="row.names", all=T)
#> Row.names cell1 cell2 cell3 cell4 cell5 cell6
#> 1 gene1 1 4 7 NA NA NA
#> 2 gene2 2 5 8 1 5 9
#> 3 gene3 3 6 9 2 6 10
#> 4 gene4 NA NA NA 3 7 11
#> 5 gene5 NA NA NA 4 8 12
Created on 2022-11-23 with reprex v2.0.2