Home > Software design >  Drop NA values from scale_fill_* either in aesthetics or in the scale
Drop NA values from scale_fill_* either in aesthetics or in the scale

Time:12-07

Context

I have a dataset that has some NA in a variable that is being used for grouping (I know, shocking!). When I throw this data at my ggplot2 blender, these NA values are treated (by default) as a group/level, which is not what I want for this plot.

I am using a standardized layer configuration for all plots, so I (a) do all data wrangling once, for all plots that will be created and (b) configure all layers that are common to all plots. I save these standard setup in a "blank" gg object and then just call it for every plot with the specific variables required for each occasion.

So, if I remove these NA from the overall gg object, I will also loose valid observations for other plots, and unfortunately this would make the client unhappy (and I don't want that). Surely, I could create a separate data frame just for this plot, and everything would be nice and dandy, the client would be happy, everyone would be happy. This is exactly how I approached it.

Now that I have time to think, I am curious on what would be the smart ggplot2 way.

Reproducible example: the issue

gg <- airquality %>%
  # create some NA values in the grouping var
  slice_head(n = 150) %>%
  mutate(group = as.character(sample(c(1:2, NA), 150, replace = TRUE))) %>%
  # standard configs for multiple plots
  ggplot()  
  theme_bw()  
  scale_fill_brewer(palette = "Paired")

One particular plot is giving me trouble

gg   geom_density(aes(Solar.R, fill = group), alpha = .7)

Which results in:

enter image description here

What I have tried

I tried adding na.translate = FALSE in the scale layer as described in the docs. It helps, since it removes the NA group from the legend, but density curve is still there:

gg <- airquality %>%
  # similar to above, except the scale
  scale_fill_brewer(palette = "Paired", na.translate = FALSE)
gg   geom_density(aes(Solar.R, fill = group), alpha = .7)

enter image description here

I also tried adding na.value = NA to the scale_fill function, but it didn't seem to make a difference. Also values like NULL and "blank" are not permitted by this particular scale.

Finally, I have seen geom_bar() examples with na.rm = TRUE in the aesthetic definition, but geom_density() refuses to accept this solution.

Question

  • How do I drop this NA level altogether from one particular plot that is sharing a standard configuration from other sibling plots, without disturbing the others?

I'm guessing there is some clever way of defining the aesthetics, or maybe another argument to scale_fill_*() that I'm not aware of.

CodePudding user response:

Perhaps one of these approaches would suit?

library(tidyverse)
set.seed(123)

test_data <- airquality %>%
  # create some NA values in the grouping var
  slice_head(n = 150) %>%
  mutate(group = as.character(sample(c(1:2, NA), 150, replace = TRUE)))

# standard configs for multiple plots
gg <- ggplot(test_data)  
  theme_bw()  
  scale_fill_brewer(palette = "Paired")

# with NAs
gg   geom_density(aes(Solar.R, fill = group), alpha = .7)
#> Warning: Removed 7 rows containing non-finite values (`stat_density()`).

# without NAs
gg   geom_density(data = test_data %>% filter(!is.na(group)),
                  aes(Solar.R, fill = group), alpha = .7)
#> Warning: Removed 5 rows containing non-finite values (`stat_density()`).


Or perhaps this method?


set.seed(123)
gg <- airquality %>%
  # create some NA values in the grouping var
  slice_head(n = 150) %>%
  mutate(group = as.character(sample(c(1:2, NA), 150, replace = TRUE))) %>%
  # standard configs for multiple plots
  {ggplot(data = .)  
  theme_bw()  
  scale_fill_brewer(palette = "Paired")}

gg  
    geom_density(data = . %>% filter(!is.na(group)), aes(Solar.R, fill = group), alpha = .7)
#> Warning: Removed 5 rows containing non-finite values (`stat_density()`).

Created on 2022-12-06 with reprex v2.0.2

  • Related