I have not been working with r for long, but have already found many answers to my questions in this community. But now I can't get any further and ask my first own question.
Objective: I want to display values from different years (here in the example 10 years) over time in a barplot. Each year should be represented by a column. The years 1 to 9 should get a uniform color, the 10th year another. For the 10th year the value should also be displayed. There should be only two entries in the legend: "Year 1 - 9" and "Year 10".
I have created the following dummy data set:
library(ggplot2)
# texts 2 display
tit <- "Title"
subtit <- "Subtitle"
lab <- c("lab1", "lab2", "lab3", "lab4")
# prepare dataset with random data
n_label <- length(lab)
cohort <-
c(
rep("year01", n_label),
rep("year02", n_label),
rep("year03", n_label),
rep("year04", n_label),
rep("year05", n_label),
rep("year06", n_label),
rep("year07", n_label),
rep("year08", n_label),
rep("year09", n_label),
rep("year10", n_label)
)
data_rel <- runif(40, min = 0, max = .5)
df_data <- data.frame(lab, cohort, data_rel)
df_data %>% summarise(count = n_distinct(cohort)) -> n_cohort
I was able to implement the plot as desired with the following code:
# plot data
df_data %>%
ggplot()
geom_bar (aes(
x = factor(lab, levels = c("lab1", "lab2", "lab3", "lab4")),
y = data_rel,
fill = cohort
),
stat = "identity",
position = position_dodge())
scale_y_continuous(labels = scales::percent, limits = c(0, 1))
theme_classic()
theme(
legend.position = "bottom",
plot.title = element_text(hjust = 0.5,
size = 14,
face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0.5),
)
geom_text(
data = subset(df_data, cohort == "year10"),
aes(
x = lab,
y = data_rel,
label = paste0(sprintf("%.1f", data_rel * 100), "%")
),
vjust = -1,
hjust = -1.5,
size = 3
)
scale_fill_manual(
values = c("#7F7F7F", "#389DC3"),
limits = c("year01", "year10"),
labels = c("Year 1 - 9", "Year 10")
)
labs(
subtitle = paste(subtit),
title = str_wrap(tit, 45),
x = "",
y = "",
fill = ""
)
Unfortunately, I cannot adjust the colors of the columns for years 1 - 9. Either not all columns get the correct color, or I get unwanted entries in the legend.
Does anyone have an idea what i am doing wrong? I am grateful for every hint!
CodePudding user response:
In setting the fill
attribute you can group all other levels of the factor together (here using forcats::fct_other
to collapse Years 1-9 into one level) to give your two levels of fill colours. At the same time, using group = cohort
will keep bars separate:
library(forcats)
# plot data
df_data %>%
ggplot()
geom_bar (aes(
x = factor(lab, levels = c("lab1", "lab2", "lab3", "lab4")),
y = data_rel,
group = cohort,
fill = fct_other(cohort, "year10", other_level = "year01")
),
stat = "identity",
position = position_dodge())
scale_y_continuous(labels = scales::percent, limits = c(0, 1))
theme_classic()
theme(
legend.position = "bottom",
plot.title = element_text(hjust = 0.5,
size = 14,
face = "bold"),
plot.subtitle = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0.5),
)
geom_text(
data = subset(df_data, cohort == "year10"),
aes(
x = lab,
y = data_rel,
label = paste0(sprintf("%.1f", data_rel * 100), "%")
),
vjust = -1,
hjust = -1.5,
size = 3
)
scale_fill_manual(
values = c("#7822CC", "#389DC3"),
limits = c("year01", "year10"),
labels = c("Year 1 - 9", "Year 10")
)
labs(
subtitle = paste(subtit),
title = str_wrap(tit, 45),
x = "",
y = "",
fill = ""
)
(Changed manual fill colour to distinguish from unfilled bars)
It's also possible to do by creating a new 2-level cohort_lumped
variable before passing to ggplot()
, but this way helps keep your data as consistent as possible up to the point of passing into graphing stages (and doesn't need extra columns storing essentially same information).