Home > Enterprise >  ggplot2: Can you acess the .data argument in subsequent layers?
ggplot2: Can you acess the .data argument in subsequent layers?

Time:11-21

I have multiple graphs I'm generating with a data set. I preform many operations on the data (filtering rows, aggregating rows, calculations over columns, etc.) before passing on the result to ggplot(). I want to access the data I passed on to ggplot() in subsequent ggplot layers and facets so I can have more control over the resulting graph and to include some characteristics of the data in the plot itself, like for example the number of observations.

Here is a reproducible example:

library(tidyverse)

cars <- mtcars

# Normal scatter plot
cars %>% 
    filter(
        # Many complicated operations
    ) %>% 
    group_by(
        # More complicated operations
        across()
    ) %>% 
    summarise(
         # Even more complicated operations
        n = n()
    ) %>% 
    ggplot(aes(x = mpg, y = qsec))  
    geom_point()  
    
    # Join the dots but only if mpg < 20 
    geom_line(data = .data %>% filter(mpg < 20))  
    
    # Include the total number of observations in the graph
    labs(caption = paste("N. obs =", NROW(.data)))

one could of course create a a separate data set before passing that onto ggplot and then reference that data set throughout (as in the example bellow). However, this is much more cumbersome as you need to save (and later remove) a data set for each graph and run two separate commands for just one graph.

I want to know if there is something that can be done that's more akin to the first example using .data (which obviously doesn't actually work).

library(tidyverse)

cars <- mtcars

tmp <- cars %>% 
    filter(
        # Many complicated operations
    ) %>% 
    group_by(
        # More complicated operations
        across()
    ) %>% 
    summarise(
         # Even more complicated operations
        n = n()
    )

tmp %>% 
    ggplot(aes(x = mpg, y = qsec))  
    geom_point()  
    
    # Join the dots but only if mpg < 20 
    geom_line(data = tmp %>% filter(mpg < 20))  
    
    # Include the total number of observations in the graph
    labs(caption = paste("N. obs =", NROW(tmp)))

Thanks for your help!

CodePudding user response:

In the help page for each geom_ it helpfully gives a standard way:

A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the layer data. A function can be created from a formula (e.g. ~ head(.x, 10)).

For labs on the other hand you can use the . placeholders in piping, but you have to a) give the . as the data argument in the first place and b) wrap the whole thing in curly braces to recognise the later ..

So for example:

library(tidyverse)

cars <- mtcars

# Normal scatter plot
cars %>%
  filter() %>%
  group_by(across()) %>%
  summarise(n = n()) %>%
  {
    ggplot(., aes(x = mpg, y = qsec))  
      geom_point()  
      geom_line(data = ~ filter(.x, mpg < 20))  
      labs(caption = paste("N. obs =", NROW(.)))
  }

Or if you don't like the purrr formula syntax, then the flashy new R anonymous functions work too:

geom_line(data = \(x) filter(x, mpg < 20))  

Unfortunately the labs function doesn't seem to have an explicit way of testing whether data is shuffling invisibly through the ggplot stack as by-and-large it usually can get on with its job without touching the main data. These are some ways around this.

  • Related