Home > OS >  ggplot: change color of bars and not show all labels in legend
ggplot: change color of bars and not show all labels in legend

Time:08-17

I have not been working with r for long, but have already found many answers to my questions in this community. But now I can't get any further and ask my first own question.

Objective: I want to display values from different years (here in the example 10 years) over time in a barplot. Each year should be represented by a column. The years 1 to 9 should get a uniform color, the 10th year another. For the 10th year the value should also be displayed. There should be only two entries in the legend: "Year 1 - 9" and "Year 10".

I have created the following dummy data set:

 library(ggplot2)

# texts 2 display
tit <- "Title"
subtit <- "Subtitle"
lab <- c("lab1", "lab2", "lab3", "lab4")

# prepare dataset with random data
n_label <- length(lab)
cohort <-
  c(
    rep("year01", n_label),
    rep("year02", n_label),
    rep("year03", n_label),
    rep("year04", n_label),
    rep("year05", n_label),
    rep("year06", n_label),
    rep("year07", n_label),
    rep("year08", n_label),
    rep("year09", n_label),
    rep("year10", n_label)
  )
data_rel <- runif(40, min = 0, max = .5)
df_data <- data.frame(lab, cohort, data_rel)
df_data %>% summarise(count = n_distinct(cohort)) -> n_cohort

I was able to implement the plot as desired with the following code:

# plot data
df_data %>%
  ggplot()  
  geom_bar (aes(
    x = factor(lab, levels = c("lab1", "lab2", "lab3", "lab4")),
    y = data_rel,
    fill = cohort
  ),
  stat = "identity",
  position = position_dodge())  
  scale_y_continuous(labels = scales::percent, limits = c(0, 1))  
  theme_classic()  
  theme(
    legend.position = "bottom",
    plot.title = element_text(hjust = 0.5,
                              size = 14,
                              face = "bold"),
    plot.subtitle = element_text(hjust = 0.5),
    plot.caption = element_text(hjust = 0.5),
  )  
  geom_text(
    data = subset(df_data, cohort == "year10"),
    aes(
      x = lab,
      y = data_rel,
      label = paste0(sprintf("%.1f", data_rel * 100), "%")
    ),
    vjust = -1,
    hjust = -1.5,
    size = 3
  )   
  scale_fill_manual(
    values = c("#7F7F7F", "#389DC3"),
    limits = c("year01", "year10"),
    labels = c("Year 1 - 9", "Year 10")
  )  
  labs(
    subtitle = paste(subtit),
    title = str_wrap(tit, 45),
    x = "",
    y = "",
    fill = ""
  )

Unfortunately, I cannot adjust the colors of the columns for years 1 - 9. Either not all columns get the correct color, or I get unwanted entries in the legend.

Does anyone have an idea what i am doing wrong? I am grateful for every hint!

CodePudding user response:

In setting the fill attribute you can group all other levels of the factor together (here using forcats::fct_other to collapse Years 1-9 into one level) to give your two levels of fill colours. At the same time, using group = cohort will keep bars separate:

library(forcats)

# plot data
df_data %>%
  ggplot()  
  geom_bar (aes(
    x = factor(lab, levels = c("lab1", "lab2", "lab3", "lab4")),
    y = data_rel,
    group = cohort,
    fill = fct_other(cohort, "year10", other_level = "year01")
  ),
  stat = "identity",
  position = position_dodge())  
  scale_y_continuous(labels = scales::percent, limits = c(0, 1))  
  theme_classic()  
  theme(
    legend.position = "bottom",
    plot.title = element_text(hjust = 0.5,
                              size = 14,
                              face = "bold"),
    plot.subtitle = element_text(hjust = 0.5),
    plot.caption = element_text(hjust = 0.5),
  )  
  geom_text(
    data = subset(df_data, cohort == "year10"),
    aes(
      x = lab,
      y = data_rel,
      label = paste0(sprintf("%.1f", data_rel * 100), "%")
    ),
    vjust = -1,
    hjust = -1.5,
    size = 3
  )   
  scale_fill_manual(
    values = c("#7822CC", "#389DC3"),
    limits = c("year01", "year10"),
    labels = c("Year 1 - 9", "Year 10")
  )  
  labs(
    subtitle = paste(subtit),
    title = str_wrap(tit, 45),
    x = "",
    y = "",
    fill = ""
  )

(Changed manual fill colour to distinguish from unfilled bars)

It's also possible to do by creating a new 2-level cohort_lumped variable before passing to ggplot(), but this way helps keep your data as consistent as possible up to the point of passing into graphing stages (and doesn't need extra columns storing essentially same information).

  • Related