Home > Net >  Ggplot loop over unique variables in a group
Ggplot loop over unique variables in a group

Time:10-02

I made a loop to make a plot for every unique value of a variable within a group. To make my code reproducible I used nyflights13 package. Unfortunately, in here my code gives desired result. In my data however I would have flight origins that don't happen in a certain year, giving me an empty plot for that origin in that year. I would like that in one group (in this example year), Only the origins that happened in that year are shown. Could somebody help me out?

library(nycflights13)
library(tidyverse)

plotter_de_plot<-function(origination, YEARR){
  eval(substitute(origination), flights)
  eval(substitute(YEARR), flights)
  flights %>%
    subset(year==YEARR)%>%
  select(month,origin,hour,year)%>%
    group_by(origin, month) %>% 
    mutate(AMOUNT = (sum(hour, na.rm=TRUE)))  %>%
    filter(!is.na(hour),
           origin==origination,year==YEARR) %>%
    ggplot(aes(month,AMOUNT), na.rm = TRUE) 
    geom_point()  
    labs(title=origination,subtitle=YEARR)
} 
for (i in unique(flights$origin)){
  plot(plotter_de_plot(i,2013))
}

CodePudding user response:

One option would be to break your pipeline into two parts, data wrangling and plotting. Doing so you could check whether the filtered and aggregated dataset contains any data using e.g. nrow > 0 and return NULL if it doesn't. In your for loop you could then check for NULL before plotting:

To mimic your use case I used flights$year[flights$origin == "EWR"] <- 2015 so that the example data includes an origin with no data for year 2013:

library(nycflights13)
library(tidyverse)

plotter_de_plot <- function(origination, YEARR) {
  d <- flights %>%
    select(month, origin, hour, year) %>%
    filter(
      !is.na(hour),
      origin == origination, year == YEARR
    ) %>% 
    group_by(month) %>%
    mutate(AMOUNT = sum(hour, na.rm = TRUE))
    
  if (nrow(d) > 0) {
    ggplot(d, aes(month, AMOUNT), na.rm = TRUE)  
      geom_point()  
      labs(title = origination, subtitle = YEARR)  
  }
}

flights$year[flights$origin == "EWR"] <- 2015

for (i in unique(flights$origin)) {
  p <- plotter_de_plot(i, 2013)
  if (!is.null(p)) plot(p)
}

CodePudding user response:

In addition to stefan's answer which adresses the problem perfectly, I would recommend using purrr::map instead of your for loop:

my_plots = unique(flights$origin) %>% 
  set_names() %>% 
  map(plotter_de_plot, YEARR=2013)
my_plots$EWR
my_plots$LGA
my_plots$JFK

This way, you can access each plot inside a list. Another way would be to use facets.

Also, your plots are absurdly heavy (several Mb) and might take a long time to plot. That is because you are using mutate() instead of summarise().

Here is an example with facets that took <1 sec to compute:

flights %>%
  filter(year==2013)%>%
  select(month, origin, hour,year)%>%
  group_by(origin, month) %>% 
  summarise(AMOUNT = (sum(hour, na.rm=TRUE))) %>%
  ggplot(aes(month,AMOUNT), na.rm = TRUE) 
  geom_point()  
  labs(subtitle="Year 2013")   
  facet_wrap(~origin)

enter image description here

  • Related