Hereby I provide you with a sample dataset:
df1 <- data.frame(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
AgeGroup = c("Y","O","Y","Y","Y","O","Y","O","O","Y"),
Sex = c("m","m","f","f","m","f","m","m","f","f"),
marker1 = c('yes', 'yes', 'no', 'yes', 'no','no', 'yes', 'no', 'yes', 'no'),
marker2 = c('no', 'yes', 'no', 'yes', 'no','yes', 'yes', 'no', 'yes', 'no'),
marker3 = c('yes', 'no', 'no', 'yes', 'yes','no', 'yes', 'no','no', 'yes'))
df1
ID AgeGroup Sex marker1 marker2 marker3
1 1 Y m yes no yes
2 2 O m yes yes no
3 3 Y f no no no
4 4 Y f yes yes yes
5 5 Y m no no yes
6 6 O f no yes no
7 7 Y m yes yes yes
8 8 O m no no no
9 9 O f yes yes no
10 10 Y f no no yes
The columns consist of categorical variables for each observation. AgeGroup has two levels of Y and O (as young and old), and the rest are clear I guess. Eventually the goal is to investigate the presence of the three markers for each combination of AgeGroup and Sex. I would like to calculate in a new table, the proportion of marker1, marker2, and marker3, for all the four possible combinations of AgeGroup and Sex (e.g., Y_m, O_m, Y_f, O_f). Then I want to to display these proportions in a heatmap where the markers are on the x-axis, and the four newly defined groups are on the y-axis. the heatmap shows the proportion of yes per each marker for each combination of AgeGroup and Sex. Can you please help me with that?
CodePudding user response:
After some data manipulation, you can use geom_tile
:
library(tidyverse)
#Data wrangling
df1 <-
df1 %>%
group_by(AgeGroup, Sex) %>%
summarise(across(starts_with("marker"),
~ sum(.x == "yes") / n())) %>%
ungroup() %>%
mutate(gp = paste0(AgeGroup, "_", Sex), .keep = "unused") %>%
pivot_longer(-gp)
# A tibble: 12 × 3
# gp name value
# <chr> <chr> <dbl>
# 1 O_f marker1 0.5
# 2 O_f marker2 1
# 3 O_f marker3 0
# 4 O_m marker1 0.5
# 5 O_m marker2 0.5
# 6 O_m marker3 0
# ...
#Plot
df1 %>%
ggplot()
aes(x = name, y = gp, fill = value)
geom_tile()
theme_minimal()
CodePudding user response:
This will look familiar to you given your previous question, but
tally <- aggregate(
df1[,-1:-3],
by = list("Group" = paste(df1$AgeGroup, df1$Sex, sep="_")),
FUN = function(x) sum(x=="yes")
)
row.names(tally) <- tally$Group
heatmap(as.matrix(tally[,-1]), scale="none", Rowv = NA, Colv= NA)
Usual heatmap
options and caveats apply - you can change the colour scheme with the col
param, flip x and y using t
to transpose the matrix, labels will start disappearing once you have too many rows/columns, etc.