Home > OS >  Drop colors from legend which are not in data
Drop colors from legend which are not in data

Time:09-17

I'd like to have a function like scale_color_sex which assigns predefined colors to the categories of the color variable. I have stored these predefined values in a named character vector, which I provide in a modified scale_color_manual (see in code).

Ideally, a user of my custom function scale_color_sex provides only some data (here starwars) and the name of the sex category (here sex) and scale_color_sex assigns the right color to geom_point. The code below produces the desired result.

But: I'd like to drop colors from the legend, which are not represented in the data. In this example it is the "NotInData" category in "red", which I don't want to see in plot. How can I achieve this dynamically?

Bonus points: Can I use some kind of regular expressions on the left-hand side of my color palette / named character?

Any advice, also on other ways to construct a color palette based on the values of sex is highly welcome!

library(tidyverse)

scale_color_sex <- function(...){
  scale_color_manual(
    ...,
    values = c(
      female = "#9986A5", 
      hermaphroditic = "#79402E", 
      male = "#CCBA72",
      none = "#0F0D0E", 
      NotInData = "red"
    )
  )
}

starwars %>% 
  ggplot(aes(x = height, y = birth_year, color = sex))  
  geom_point()   
  scale_color_sex()
#> Warning: Removed 44 rows containing missing values (geom_point).

CodePudding user response:

Take a look at the drop argument of scale_color_manual:

library(tidyverse)

scale_color_sex <- function(..., drop = FALSE){
  scale_color_manual(
    ...,
    drop = drop,
    limits = force,
    values = c(
      female = "#9986A5", 
      hermaphroditic = "#79402E", 
      male = "#CCBA72",
      none = "#0F0D0E", 
      NotInData = "red"
    )
  )
}

starwars %>% 
  ggplot(aes(x = height, y = birth_year, color = sex))  
  geom_point()   
  scale_color_sex()
#> Warning: Removed 45 rows containing missing values (geom_point).

Created on 2021-09-10 by the ggplot with red dot showing

showing the red "NotInData" dot in the middle. (If you were to plot the whole data on a larger scale/resolution, you might see it break out.)

If you don't want to plot them, however, best method is to filter it out before sending to ggplot, and by default it will be removed from the legend.

filter(starwars, sex != "NotInData") %>% 
  ggplot(aes(x = height, y = birth_year, color = sex))  
  geom_point()   
  scale_color_sex()

ggplot with "NotInData" removed from legend

  • Related