Home > Enterprise >  Stacked bar chart of count across multiple columnns
Stacked bar chart of count across multiple columnns

Time:08-20

My data looks like this

a b c d e f
High Low High Medium Low High
High Medium Low High Medium Low
High Low High Medium Low High
High Medium Low High High Medium

I am trying to create a stacked bar chart, where the x axis has each column, a, b, c etc. as each bar, and with the y axis as a count of the occurrence/frequency of high, medium, and low per column, with high, medium, and low split within each single bar/column (3 stacks per bar).

I can use the following code to produce a new dataframe for the count of occurrences in a single column and then plot that in a bar chart, but I'm struggling to combine this into a single stacked bar, then repeat for all the columns.

library(dplyr)
newdf = df %>% dplyr::count(a)

Ideally this would also be a horizontal chart but I can figure that out later.

CodePudding user response:

You need to pivot into long format, then count unique pairings:

library(tidyverse)

df %>%
  pivot_longer(everything()) %>%
  count(name, value) %>%
  ggplot(aes(name, n, fill = factor(value, c("High", "Medium", "Low"))))  
  geom_col(color = "gray50")  
  scale_fill_brewer(palette = "YlGnBu")  
  labs(x = "Column", y = "Count", fill = NULL)  
  theme_minimal(base_size = 16)

enter image description here


Data in reproducible format

df <- structure(list(a = c("High", "High", "High", "High"), b = c("Low", 
"Medium", "Low", "Medium"), c = c("High", "Low", "High", "Low"
), d = c("Medium", "High", "Medium", "High"), e = c("Low", "Medium", 
"Low", "High"), f = c("High", "Low", "High", "Medium")), 
class = "data.frame", row.names = c(NA, -4L))

df
#>      a      b    c      d      e      f
#> 1 High    Low High Medium    Low   High
#> 2 High Medium  Low   High Medium    Low
#> 3 High    Low High Medium    Low   High
#> 4 High Medium  Low   High   High Medium

Created on 2022-08-19 with enter image description here

If you don't have factors yet, apply as.factor beforehand,

dat[] <- lapply(dat, as.factor)

or to only temporarily use factors:

sapply(lapply(dat, as.factor), tabulate, nbins=3) |> barplot()

Data:

dat <- structure(list(a = structure(c(1L, 1L, 1L, 1L), levels = "High", class = "factor"), 
    b = structure(c(1L, 2L, 1L, 2L), levels = c("Low", "Medium"
    ), class = "factor"), c = structure(c(1L, 2L, 1L, 2L), levels = c("High", 
    "Low"), class = "factor"), d = structure(c(2L, 1L, 2L, 1L
    ), levels = c("High", "Medium"), class = "factor"), e = structure(c(2L, 
    3L, 2L, 1L), levels = c("High", "Low", "Medium"), class = "factor"), 
    f = structure(c(1L, 2L, 1L, 3L), levels = c("High", "Low", 
    "Medium"), class = "factor")), row.names = c(NA, -4L), class = "data.frame")
  • Related