I have a dataframe of counts of cases categorised by 5 attributes, each attribute is either TRUE, FALSE or NA. I'm trying to find a way of presenting the counts of the different combinations of attributes as a plot in R. Here is some data (the real data has a few more attributes):
> df <- data.frame( attibute_A = c(T,F,T,NA,F), attribute_B = c(F,T,T,NA,T), attribute_C = c(T,T,F,F,F), attribute_D = c(T,T,NA,NA,F), count=c(100,55,34,12,3))
> df
attibute_A attribute_B attribute_C attribute_D count
1 TRUE FALSE TRUE TRUE 100
2 FALSE TRUE TRUE TRUE 55
3 TRUE TRUE FALSE NA 34
4 NA NA FALSE NA 12
5 FALSE TRUE FALSE FALSE 3
Here is a mock-up of a barplot I thought I could make to present the counts of cases with the different combinations of attributes:
I'm probably not searching with the right keywords but I just can't quite find any examples for the type of plot I want to make, so I'm not sure how to do it. Do I need to combine a heatmap with a barplot or is there a better way of achieving this? What is a plot like this actually called (if there is a name)?
CodePudding user response:
Not exactly what you desire, but might be a start. edit: Dirty hack to het the colnames in an empty bar
library(tidyverse)
df %>%
rbind(c(names(df)[1:4],0)) %>%
unite("label", starts_with("att"), sep = "\n") %>%
mutate(count = as.numeric(count)) %>%
ggplot(aes(x = label, y = count))
geom_col()
labs(x = "")
CodePudding user response:
If you want to do it all in a single ggplot call, without stitching plots together, you could do:
library(tidyverse)
ystep <- max(df$count, na.rm = TRUE) / 5
df %>% mutate(A = paste(attribute_A), B = paste(attribute_B),
C = paste(attribute_C), D = paste(attribute_D)) %>%
ggplot(aes(x = 1:5, y = count))
geom_col(fill = 'deepskyblue3', color = 'deepskyblue4', width = 0.5)
scale_y_continuous(limits = c(-max(df$count), max(df$count)),
breaks = c(4:1 * -ystep, pretty(df$count)),
labels = c(names(df[4:1]), pretty(df$count)))
geom_tile(aes(y = -1 * ystep, fill = A, color = A), height = ystep)
geom_tile(aes(y = -2 * ystep, fill = B, color = B), height = ystep)
geom_tile(aes(y = -3 * ystep, fill = C, color = C), height = ystep)
geom_tile(aes(y = -4 * ystep, fill = D, color = D), height = ystep)
geom_text(aes(y = -1 * ystep, label = A, color = A), fontface = 2)
geom_text(aes(y = -2 * ystep, label = B, color = B), fontface = 2)
geom_text(aes(y = -3 * ystep, label = C, color = C), fontface = 2)
geom_text(aes(y = -4 * ystep, label = D, color = D), fontface = 2)
scale_fill_manual(values = c('pink', 'gold2', 'lightgreen'))
scale_color_manual(values = c('red4', 'orange4', 'green4'))
theme_minimal()
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
axis.text.x = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_text(hjust = 0.75),
legend.position = 'none')