Heatmap of Gene intensity values in R-CodePudding

I have data that look like this:

Gene	HBEC-KT-01	HBEC-KT-02	HBEC-KT-03	HBEC-KT-04	HBEC-KT-05	Primarycells-02	Primarycells-03	Primarycells-04	Primarycells-05
BPIFB1	15726000000	15294000000	15294000000	14741000000	22427000000	87308000000	2.00E 11	1.04E 11	1.51E 11
LCN2	18040000000	26444000000	28869000000	30337000000	10966000000	62388000000	54007000000	56797000000	38414000000
C3	2.52E 11	2.26E 11	1.80E 11	1.80E 11	1.78E 11	46480000000	1.16E 11	69398000000	78766000000
MUC5AC	15647000	8353200	12617000	12221000	29908000	40893000000	79830000000	28130000000	69147000000
MUC5B	965190000	693910000	779970000	716110000	1479700000	38979000000	90175000000	41764000000	50535000000
ANXA2	14705000000	18721000000	21592000000	18904000000	22657000000	28163000000	24282000000	21708000000	16528000000

I want to make a heatmap like the following using R. I am following a paper and they quoted "Heat maps were generated with the ‘pheatmap’ package76, where correlation clustering distance row was applied". Here is their heatmap.

I want the same like this and I am trying to make one using R by following tutorials but I am new to R language and know nothing about R.

Here is my code.

df <- read.delim("R.txt", header=T, row.names="Gene")
df_matrix <- data.matrix(df)
pheatmap(df_matrix, 
     main = "Heatmap of Extracellular Genes",
     color = colorRampPalette(rev(brewer.pal(n = 10, name = "RdYlBu")))(10),
     cluster_cols = FALSE,
     show_rownames = F,
     fontsize_col = 10,
     cellwidth = 40,
     )

This is what I get.

When I try using clustering, I got the error.

pheatmap(
mat = df_matrix,
  scale = "row",
  cluster_column = F,
  show_rownames = TRUE,
  drop_levels = TRUE,
  fontsize = 5,
  clustering_method = "complete",
  main = "Hierachical Cluster Analysis"
)

Error in hclust(d, method = method) : 
NA/NaN/Inf in foreign function call (arg 10)

Can someone help me with the code?

CodePudding user response：

You can normalize the data using scale to archive a more uniform coloring. Here, the mean expression is set to 0 for each sample. Genes lower expressed than average have a negative z score:

library(tidyverse)
library(pheatmap)

data <- tribble(
  ~Gene, ~`HBEC-KT-01`, ~`HBEC-KT-02`, ~`HBEC-KT-03`, ~`HBEC-KT-04`, ~`HBEC-KT-05`, ~`Primarycells-03`, ~`Primarycells-04`, ~`Primarycells-05`,
  "BPIFB1", 1.5726e 10, 1.5294e 10, 1.5294e 10, 1.4741e 10, 2.2427e 10, 2e 11, 1.04e 11, 1.51e 11,
  "LCN2", 1.804e 10, 2.6444e 10, 2.8869e 10, 3.0337e 10, 1.0966e 10, 5.4007e 10, 5.6797e 10, 3.8414e 10,
  "C3", 2.52e 11, 2.26e 11, 1.8e 11, 1.8e 11, 1.78e 11, 1.16e 11, 6.9398e 10, 7.8766e 10,
  "MUC5AC", 15647000, 8353200, 12617000, 12221000, 29908000, 7.983e 10, 2.813e 10, 6.9147e 10,
  "MUC5B", 965190000, 693910000, 779970000, 716110000, 1479700000, 9.0175e 10, 4.1764e 10, 5.0535e 10,
  "ANXA2", 1.4705e 10, 1.8721e 10, 2.1592e 10, 1.8904e 10, 2.2657e 10, 2.4282e 10, 2.1708e 10, 1.6528e 10
)
data %>%
  mutate(across(where(is.numeric), scale)) %>%
  column_to_rownames("Gene") %>%
  pheatmap(
    scale = "row",
    cluster_column = F,
    show_rownames = FALSE,
    show_colnames = TRUE,
    treeheight_col = 0,
    drop_levels = TRUE,
    fontsize = 5,
    clustering_method = "complete",
    main = "Hierachical Cluster Analysis (z-score)",
  )

^{Created on 2021-09-26 by the reprex package (v2.0.1)}