Home > Enterprise >  Creating a heatmap based on extracted proportions out of a categorical data
Creating a heatmap based on extracted proportions out of a categorical data

Time:09-07

Hereby I provide you with a sample dataset:

df1 <- data.frame(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
                  AgeGroup = c("Y","O","Y","Y","Y","O","Y","O","O","Y"),
                  Sex = c("m","m","f","f","m","f","m","m","f","f"),
marker1 = c('yes', 'yes', 'no', 'yes', 'no','no', 'yes', 'no', 'yes', 'no'),
marker2 = c('no', 'yes', 'no', 'yes', 'no','yes', 'yes', 'no', 'yes', 'no'),
marker3 = c('yes', 'no', 'no', 'yes', 'yes','no', 'yes', 'no','no', 'yes'))

df1

   ID AgeGroup Sex marker1 marker2 marker3
1   1        Y   m     yes      no     yes
2   2        O   m     yes     yes      no
3   3        Y   f      no      no      no
4   4        Y   f     yes     yes     yes
5   5        Y   m      no      no     yes
6   6        O   f      no     yes      no
7   7        Y   m     yes     yes     yes
8   8        O   m      no      no      no
9   9        O   f     yes     yes      no
10 10        Y   f      no      no     yes

The columns consist of categorical variables for each observation. AgeGroup has two levels of Y and O (as young and old), and the rest are clear I guess. Eventually the goal is to investigate the presence of the three markers for each combination of AgeGroup and Sex. I would like to calculate in a new table, the proportion of marker1, marker2, and marker3, for all the four possible combinations of AgeGroup and Sex (e.g., Y_m, O_m, Y_f, O_f). Then I want to to display these proportions in a heatmap where the markers are on the x-axis, and the four newly defined groups are on the y-axis. the heatmap shows the proportion of yes per each marker for each combination of AgeGroup and Sex. Can you please help me with that?

CodePudding user response:

After some data manipulation, you can use geom_tile:

library(tidyverse)

#Data wrangling
df1 <- 
  df1 %>% 
  group_by(AgeGroup, Sex) %>% 
  summarise(across(starts_with("marker"), 
                   ~ sum(.x == "yes") / n())) %>% 
  ungroup() %>% 
  mutate(gp = paste0(AgeGroup, "_", Sex), .keep = "unused") %>% 
  pivot_longer(-gp)

#   A tibble: 12 × 3
#   gp    name    value
#   <chr> <chr>   <dbl>
# 1 O_f   marker1 0.5  
# 2 O_f   marker2 1    
# 3 O_f   marker3 0    
# 4 O_m   marker1 0.5  
# 5 O_m   marker2 0.5  
# 6 O_m   marker3 0    
# ...

#Plot
df1 %>% 
  ggplot()  
  aes(x = name, y = gp, fill = value)  
  geom_tile()  
  theme_minimal()

enter image description here

CodePudding user response:

This will look familiar to you given your previous question, but

tally <- aggregate(
    df1[,-1:-3], 
    by = list("Group" = paste(df1$AgeGroup, df1$Sex, sep="_")), 
    FUN = function(x) sum(x=="yes")
)

row.names(tally) <- tally$Group

heatmap(as.matrix(tally[,-1]), scale="none", Rowv = NA, Colv= NA)

output of heatmap function

Usual heatmap options and caveats apply - you can change the colour scheme with the col param, flip x and y using t to transpose the matrix, labels will start disappearing once you have too many rows/columns, etc.

  • Related