How to preserve column names with transpose matrix correlation-CodePudding

I have a simple data frame with normalized gene expression values. The df has genes as row names, samples in each column and expression values as follows:

      sample1 sample2 sample3 sample4 sample5
gene1   1        4       .6     14      20
gene2   5        16      14      1      22
gene3   4        1       99      3      53
gene4   87       2       12      3      19
gene5   33       77      15     14      22

I am trying to calculate the correlation between genes in each sample. I have used the code:

Mat_1 <- as.matrix(cor(t(df_m)))

But the resulting matrix removes my column names and puts the gene names in both rows and columns instead. Is there a way to compute this correlation matrix while preserving the genes and samples names in the rows and columns respectively? Or at least retrieve the sample names for the values so that they can be visualized according to sample?

CodePudding user response：

You can do it this way:

  dat <- data.frame(sample1 = c(1,5,4,87,33), 
           sample2 = c(4,16,1,2,77), 
           sample3 = c(.6,14,99,12,15), 
           sample4 = c(14,1,3,3,14), 
           sample5 = c(20,22,53,19,22))

rownames(dat) <- paste0("gene", 1:5)

cor(t(dat), dat)
#>          sample1     sample2    sample3    sample4    sample5
#> gene1  0.6903946  0.75223370 -0.3819662  0.2896108 -0.4613392
#> gene2 -0.4466371  0.74893869  0.2379466  0.1506124  0.2424242
#> gene3 -0.2759601  0.18105885  0.8857378 -0.0134290  0.8841590
#> gene4 -0.4172298 -0.12982712 -0.3281653  0.7309607 -0.2186932
#> gene5 -0.4621283 -0.03506164 -0.3451630 -0.2862352 -0.3168201

cor(dat$sample1, c(unlist(dat[1,])))
#> [1] 0.6903946

^{Created on 2022-04-06 by the reprex package (v2.0.1)}

But, note that each cell gives you the correlation of the corresponding row and column of the original data. As displayed above, the upper-left cell is the correlation between the gene1 row and the sample1 column in the original data.