I have a simple data frame with normalized gene expression values. The df has genes as row names, samples in each column and expression values as follows:
sample1 sample2 sample3 sample4 sample5
gene1 1 4 .6 14 20
gene2 5 16 14 1 22
gene3 4 1 99 3 53
gene4 87 2 12 3 19
gene5 33 77 15 14 22
I am trying to calculate the correlation between genes in each sample. I have used the code:
Mat_1 <- as.matrix(cor(t(df_m)))
But the resulting matrix removes my column names and puts the gene names in both rows and columns instead. Is there a way to compute this correlation matrix while preserving the genes and samples names in the rows and columns respectively? Or at least retrieve the sample names for the values so that they can be visualized according to sample?
CodePudding user response:
You can do it this way:
dat <- data.frame(sample1 = c(1,5,4,87,33),
sample2 = c(4,16,1,2,77),
sample3 = c(.6,14,99,12,15),
sample4 = c(14,1,3,3,14),
sample5 = c(20,22,53,19,22))
rownames(dat) <- paste0("gene", 1:5)
cor(t(dat), dat)
#> sample1 sample2 sample3 sample4 sample5
#> gene1 0.6903946 0.75223370 -0.3819662 0.2896108 -0.4613392
#> gene2 -0.4466371 0.74893869 0.2379466 0.1506124 0.2424242
#> gene3 -0.2759601 0.18105885 0.8857378 -0.0134290 0.8841590
#> gene4 -0.4172298 -0.12982712 -0.3281653 0.7309607 -0.2186932
#> gene5 -0.4621283 -0.03506164 -0.3451630 -0.2862352 -0.3168201
cor(dat$sample1, c(unlist(dat[1,])))
#> [1] 0.6903946
Created on 2022-04-06 by the reprex package (v2.0.1)
But, note that each cell gives you the correlation of the corresponding row and column of the original data. As displayed above, the upper-left cell is the correlation between the gene1
row and the sample1
column in the original data.