Home > Software design >  Add overall bar and perc labels to geom_bar
Add overall bar and perc labels to geom_bar

Time:12-28

I'm looking for a solution for the following problem: I have data that contains two factor variables EDU and LEVEL. The reproducible data sample is here:

structure(list(EDU = structure(c(3L, 1L, 2L, 2L, 3L, 2L, 3L, 
2L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 2L, 3L, 3L, 1L, 2L, 3L, 2L, 
2L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 
3L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 1L, 3L, 1L), .Label = c("A", 
"B", "C"), class = "factor"), LEVEL = structure(c(3L, 3L, 4L, 
2L, 4L, 3L, 1L, 2L, 2L, 1L, 3L, 2L, 3L, 2L, 3L, 3L, 4L, 2L, 2L, 
4L, 1L, 2L, 3L, 3L, 1L, 4L, 2L, 3L, 1L, 1L, 2L, 3L, 1L, 2L, 1L, 
4L, 3L, 1L, 4L, 3L, 4L, 1L, 4L, 2L, 4L, 1L, 1L, 4L, 3L, 1L), .Label = c("1", 
"2", "3", "4"), class = "factor")), class = "data.frame", row.names = c(NA, 
-50L))

Using this data I want to plot a barplot with ggplot2 showing the grouping variable EDU on the x-axis and the cumulative percentages of LEVEL on the y-axis. Additionally I want to add a fourth bar that contains the percentages of LEVEL but not grouped by EDU -- somewhat like an "overall bar". Furthermore I want to add percentage labels within the plot, so that every LEVEL is labelled with the corresponding relative frequencies like in this Example plot

That plot looks good so far. But as above-mentioned my aim is to add percentage labels, probably with geom_text AND a fourth "overall bar" besides the three existing ones. For the percentage labels I also tried to make a prop.table and added the percentage labels with the corresponding props and annotate:

props <- prop.table(table(df$EDU, df$LEVEL), margin=1)

ggplot(df, aes(x=EDU, fill=LEVEL))  
  geom_bar(position="fill")  
  scale_y_continuous(labels = scales::percent)  
  annotate("text", x="A", y=.15, label=scales::percent(props[1,4]))  
  annotate("text", x="B", y=.10, label=scales::percent(props[2,4]))  
  annotate("text", x="C", y=.275, label=scales::percent(props[3,4]))  
  
  annotate("text", x="A", y=.375, label=scales::percent(props[1,3]))  
  annotate("text", x="B", y=.275, label=scales::percent(props[2,3]))  
  annotate("text", x="C", y=.625, label=scales::percent(props[3,3]))  
  
  annotate("text", x="A", y=.66, label=scales::percent(props[1,2]))  
  annotate("text", x="B", y=.5, label=scales::percent(props[2,2]))  
  annotate("text", x="C", y=.78, label=scales::percent(props[3,2]))  
  
  annotate("text", x="A", y=.9, label=scales::percent(props[1,1]))  
  annotate("text", x="B", y=.9, label=scales::percent(props[2,1]))  
  annotate("text", x="C", y=.9, label=scales::percent(props[3,1])) 

That results in the following plot: Example 2

This seems cumbersome to me, especially when I want to create more than one plot and have to annotate each percentage separately. Here, the question might be how I can set the y-arguments in annotate in an "automised" way to let R position the labels for me.

Regarding the "overall bar" problem I have no idea how to solve this, unfortunately.

I'm grateful for any help!

CodePudding user response:

Rest assured: The more experienced you get, the less you will be afraid of preparing the data beforehand. You will see that it is often way easier and cleaner to prepare the data first to what you want to plot, and then to plot. Don't try to do everything within ggplot2, that can get quite painful.

Comments in the code

library(tidyverse)

##  create a percentage column manually
df_perc <- 
  df %>% 
  count(EDU, LEVEL) %>%
  group_by(EDU) %>%
  mutate(perc = n*100/sum(n)) 

## for the total, create a new data frame and bind to the old one
total <- 
  df_perc %>%
  group_by(LEVEL) %>%
  summarise(n = sum(n)) %>%
  ## ungroup for the total
  ungroup() %>%
  ## add EDU column called total, so you can bind it and plot it easily 
  mutate(perc= n*100/sum(n), EDU = "Total")

## now bind them and plot them
bind_rows(df_perc, total) %>%
ggplot(aes(x=EDU, y = perc, fill=LEVEL))  
  ## use geom_col, and remove position = fill
  geom_col()  
  # now you can add the labels easily as per all those threads
  geom_text(aes(label = paste(round(perc, 2), "%")), position = position_stack(vjust = .5))  
  ## you can either change the y values, or use a different scale factor
  scale_y_continuous("Percent", labels = function(x) scales::percent(x, scale = 1))

  • Related