Home > Mobile >  plotting heatmap with triangular split tiles of more than one categorical variable in the same cell
plotting heatmap with triangular split tiles of more than one categorical variable in the same cell

Time:04-29

I have this dummy dataset:

dat <- data.frame(ID = c(1:8),
              col1=c("A B",NA,NA,"B C","A C","A",NA,"C"),
              col2=c("A D","B G",NA,NA,NA,NA,"E","C E"),
              col3=c("A A A A",NA,"B B C B B",NA,NA,"A D","D C","C E"),
              col4=c(NA,NA,NA,"B A B",NA,NA,NA,"E"),
              row.names = c("row1","row2","row3","row4","row5","row6","row7","row8"))

And I am trying to make a heatmap of this dataframe with the categorical variables A,B,C,D,E,G colored uniquely. Had there been only one value for one cell, it would be easy to plot a heatmap. This would be the code:

dat <- dat %>% gather(key = "variable", value = "value", col1:col4)
ggplot(dat3, aes(variable, ID))   geom_tile(aes(fill = value))

But as there are more than one value and in some cells, the values are even repeating, it's difficult to code for it. To plot more than 2 variables in the same cell, I'm thinking of splitting the cell into 2 triangle, the upper triangle colored as per one value and the other being colored differently as per the 2nd value. Can anyone help on how to code for it?

In excel desired plot would look something like this raw image

CodePudding user response:

As I already mentioned in my comment the referenced answer by Allan is IMHO the way to go to achieve your desired result. However, to make the code work requires one step to bring your data into the right shape, i.e. your complicated things by creating combinations of categories like "A B". But that could be fixed using e.g. tidyr::separate_rows. And with some renaming Allan's code works without any adjustments and at least IMHO is close to the desired result you added as an image.

library(ggplot2)
library(tidyr)
library(dplyr)

dat3 <- dat %>% 
  gather(key = "variable", value = "value", col1:col4) |> 
  separate_rows(value)

# Rename dataset and variables
df <- dat3 %>% 
  dplyr::rename(x = variable, y = ID, group = value)

# https://stackoverflow.com/a/71148281/12993861
df1    <- df[!duplicated(interaction(df$x, df$y)),]
df2    <- df[duplicated(interaction(df$x, df$y)),]
df2    <- df[rep(seq(nrow(df)), each = 3),]
df2$x1 <- as.numeric(as.factor(df2$x))
df2$y1 <- as.numeric(as.factor(df2$y))
df2$x1 <- df2$x1   c(-0.5, 0.5, 0.5)
df2$y1 <- df2$y1   c(-0.5, -0.5, 0.5)
df2$z  <- rep(seq(nrow(df2)/3), each = 3)

ggplot(df1, aes(x = x, y = y, fill = group))   
  geom_tile()  
  geom_polygon(data = df2, aes(x = x1, y = y1, group = z), size = 0)

  • Related