I'm looking for a solution for the following problem:
I have data that contains two factor variables EDU
and LEVEL
. The reproducible data sample is here:
structure(list(EDU = structure(c(3L, 1L, 2L, 2L, 3L, 2L, 3L,
2L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 2L, 3L, 3L, 1L, 2L, 3L, 2L,
2L, 2L, 1L, 1L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 2L, 3L,
3L, 1L, 1L, 3L, 3L, 3L, 3L, 2L, 1L, 3L, 1L), .Label = c("A",
"B", "C"), class = "factor"), LEVEL = structure(c(3L, 3L, 4L,
2L, 4L, 3L, 1L, 2L, 2L, 1L, 3L, 2L, 3L, 2L, 3L, 3L, 4L, 2L, 2L,
4L, 1L, 2L, 3L, 3L, 1L, 4L, 2L, 3L, 1L, 1L, 2L, 3L, 1L, 2L, 1L,
4L, 3L, 1L, 4L, 3L, 4L, 1L, 4L, 2L, 4L, 1L, 1L, 4L, 3L, 1L), .Label = c("1",
"2", "3", "4"), class = "factor")), class = "data.frame", row.names = c(NA,
-50L))
Using this data I want to plot a barplot with ggplot2
showing the grouping variable EDU
on the x-axis and the cumulative percentages of LEVEL
on the y-axis. Additionally I want to add a fourth bar that contains the percentages of LEVEL
but not grouped by EDU
-- somewhat like an "overall bar". Furthermore I want to add percentage labels within the plot, so that every LEVEL
is labelled with the corresponding relative frequencies like in this
That plot looks good so far. But as above-mentioned my aim is to add percentage labels, probably with geom_text
AND a fourth "overall bar" besides the three existing ones. For the percentage labels I also tried to make a prop.table
and added the percentage labels with the corresponding props and annotate
:
props <- prop.table(table(df$EDU, df$LEVEL), margin=1)
ggplot(df, aes(x=EDU, fill=LEVEL))
geom_bar(position="fill")
scale_y_continuous(labels = scales::percent)
annotate("text", x="A", y=.15, label=scales::percent(props[1,4]))
annotate("text", x="B", y=.10, label=scales::percent(props[2,4]))
annotate("text", x="C", y=.275, label=scales::percent(props[3,4]))
annotate("text", x="A", y=.375, label=scales::percent(props[1,3]))
annotate("text", x="B", y=.275, label=scales::percent(props[2,3]))
annotate("text", x="C", y=.625, label=scales::percent(props[3,3]))
annotate("text", x="A", y=.66, label=scales::percent(props[1,2]))
annotate("text", x="B", y=.5, label=scales::percent(props[2,2]))
annotate("text", x="C", y=.78, label=scales::percent(props[3,2]))
annotate("text", x="A", y=.9, label=scales::percent(props[1,1]))
annotate("text", x="B", y=.9, label=scales::percent(props[2,1]))
annotate("text", x="C", y=.9, label=scales::percent(props[3,1]))
That results in the following plot:
This seems cumbersome to me, especially when I want to create more than one plot and have to annotate each percentage separately. Here, the question might be how I can set the y-arguments in annotate
in an "automised" way to let R position the labels for me.
Regarding the "overall bar" problem I have no idea how to solve this, unfortunately.
I'm grateful for any help!
CodePudding user response:
Rest assured: The more experienced you get, the less you will be afraid of preparing the data beforehand. You will see that it is often way easier and cleaner to prepare the data first to what you want to plot, and then to plot. Don't try to do everything within ggplot2, that can get quite painful.
Comments in the code
library(tidyverse)
## create a percentage column manually
df_perc <-
df %>%
count(EDU, LEVEL) %>%
group_by(EDU) %>%
mutate(perc = n*100/sum(n))
## for the total, create a new data frame and bind to the old one
total <-
df_perc %>%
group_by(LEVEL) %>%
summarise(n = sum(n)) %>%
## ungroup for the total
ungroup() %>%
## add EDU column called total, so you can bind it and plot it easily
mutate(perc= n*100/sum(n), EDU = "Total")
## now bind them and plot them
bind_rows(df_perc, total) %>%
ggplot(aes(x=EDU, y = perc, fill=LEVEL))
## use geom_col, and remove position = fill
geom_col()
# now you can add the labels easily as per all those threads
geom_text(aes(label = paste(round(perc, 2), "%")), position = position_stack(vjust = .5))
## you can either change the y values, or use a different scale factor
scale_y_continuous("Percent", labels = function(x) scales::percent(x, scale = 1))