How do I iterate over different length lists?-CodePudding

I'm trying to create a looped function that will generate the same visual through a combination of values from two fields in the dataset. However, the two fields are different lengths.

Report type has two unique values while criteria no has 10. The first combination only goes up to 7 the plots stop. I don't know how to have it go to the next value in Report type to continue down the criteria list.

This is what my function looks like:

plot <- function(df, x, y){
  # create list of reports and criterias in data to loop over
  
  rpt_list<-unique(Property$REPORT_TYPE)
  crit_list<-unique(Property$CRITERIA_NO)

  for (i in length(rpt_list)) {

    for(j in seq_along(crit_list)){

      x_var <- enquo(x)
      y_var <- enquo(y)
      
    blah<-ggplot(data=subset(Property, REPORT_TYPE==rpt_list[[i]] & CRITERIA_NO==crit_list[[j]]), aes(x=!!x_var, y=!!y_var))  
            geom_bar(stat="identity")  
            facet_wrap(~DESCRIPTION)  
            ggtitle(expression('Properties by Qtr'))
    print(blah)
    }
  }
}

This is the error I'm getting.

 Error: Faceting variables must have at least one value
Run `rlang::last_error()` to see where the error occurred.

The graphing part works but it seems like there's an issue between lines 4-8 where I'm trying to create a list to iterate over.

UPDATE: data structure

structure(list(QTR_END_DATE = structure(c(15795, 15795, 15795, 
15795, 15795, 15795, 15795, 15795, 15795, 15795), class = "Date"), 
    REPORT_TYPE = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
    2L, 2L), .Label = c("PT", "RE", "DU", "OY", "ST", "SZ"), class = "factor"), 
    CRITERIA_NO = c(1, 2, 3, 4, 5, 6, 7, 1, 2, 3), DESCRIPTION = structure(c(57L, 
    66L, 68L, 75L, 82L, 77L, 72L, 74L, 71L, 60L), .Label = c("$10 M to $15 M                                    ", 
    "$15 M to $25 M                                    ", "$25 M to $35 M                                    ", 
    "$35 M to $50 M                                    ", "$5 M to $10 M                                     ", 
    "$50 M to $100 M                                   ", "1 - 3 years                                       ", 
    "1976", "1977", "1978", "1979", "1980", "1981", "1982", "1983", 
    "1984", "1985", "1986", "1987", "1988", "1989", "1990", "1991", 
    "1992", "1993", "1994", "1995", "1996", "1997", "1998", "1999", 
    "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", 
    "2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", 
    "2016", "2017", "2018", "2019", "2020", "2021", "3 - 5 years                                       ", 
    "5 - 7 years                                       ", "7 - 10 years                                      ", 
    "Apartment                                         ", "Current                                           ", 
    "Delinquent                                        ", "East North Central                                ", 
    "East South Central                                ", "Extended                                          ", 
    "Foreclosed                                        ", "Greater than $100 M                               ", 
    "Greater than 10 years                             ", "Hotel/Motel                                       ", 
    "In-Process of Foreclosure/ Foreclosed             ", "Industrial                                        ", 
    "Less than 1 year                                  ", "Loans less than $5 M                              ", 
    "Mid Atlantic                                      ", "Mixed Use                                         ", 
    "Mountain                                          ", "New England                                       ", 
    "Office Building                                   ", "Other                                             ", 
    "Other Commercial                                  ", "Pacific                                           ", 
    "Paid                                              ", "Prior to 1976                                     ", 
    "Restructured                                      ", "Retail                                            ", 
    "Sold                                              ", "South Atlantic                                    ", 
    "West North Central                                ", "West South Central                                "
    ), class = "factor"), NUMBER_PROPERTY = c(808, 28, 972, 883, 
    1012, 235, 18, 155, 734, 356)), row.names = c(NA, 10L), class = "data.frame")

and call

plot(Property, QTR_END_DATE, NUMBER_PROPERTY)

CodePudding user response：

The problem is that you are iterating over each combination of REPORT_TYPE and CRITERIA_NO even though some combinations, like PE-4, don't exist in the data. This results in you passing an empty data frame into ggplot() which eventually leads to the error you're seeing.

Here's an example way to fix this.

library(tidyverse)
quarter_plot = function(data, x, y) {
  df %>% 
    split(list(.$REPORT_TYPE, .$CRITERIA_NO)) %>% 
    discard(~ nrow(.x) == 0) %>% 
    map(function(sub_data) {
      ggplot(sub_data, aes_string(x = x, y = y))  
        geom_bar(stat = "identity")  
        facet_wrap(~ DESCRIPTION)  
        ggtitle("Properties by Qtr")
    })
}

# Example usage
quarter_plot(Property, "QTR_END_DATE", "NUMBER_PROPERTY")

In my implementation, I split the data by each combination of REPORT_TYPE and CRITERIA_NO and then removed the cases where there is no data. Afterwards I used purrr::map() to generate a plot for each sub-dataframe. A couple of notes regarding this implementation:

Instead of passing symbols into the function I changed it to strings, as aes_string() makes the implementation cleaner. Feel free to revert this.
You accidentally used Property in your function even though df is the parameter - I fixed this.
I don't recommend naming your function plot() as it overrides an existing function.
I recommend that instead of printing the plots, you actually return them (this was done in my version using map()). This gives you the option to save the plots to a variable, and you are still easily able to print the plots to the screen by just running the function in your console.

If you want to make minimal changes to your original function, you could go with something like:

plot <- function(df, x, y){
  # create list of reports and criterias in data to loop over
  
  rpt_list<-unique(df$REPORT_TYPE)
  crit_list<-unique(df$CRITERIA_NO)
  
  for (i in seq_along(rpt_list)) {
    for(j in seq_along(crit_list)){
      data=subset(
        df, 
        REPORT_TYPE==rpt_list[[i]] & CRITERIA_NO==crit_list[[j]]
      )
      if (nrow(data) == 0) {
        next
      }
      x_var <- enquo(x)
      y_var <- enquo(y)
      
      blah<-ggplot(data, aes(x=!!x_var, y=!!y_var))  
        geom_bar(stat="identity")  
        facet_wrap(~DESCRIPTION)  
        ggtitle(expression('Properties by Qtr'))
      print(blah)
    }
  }
}

CodePudding user response：

I think that the error comes when you subset your data set for the REPORT_TYPE=RE and CRITERIA_NO=4 case, since you get a 0 rows data set.

An option to avoid this error can be to make first the subset, then check if you have a 0 row data, and when possible make the plot:

plot <- function(df, x, y){
  # create list of reports and criterias in data to loop over
  
  rpt_list<-unique(Property$REPORT_TYPE)
  crit_list<-unique(Property$CRITERIA_NO)

  for (i in length(rpt_list)) {

    for(j in seq_along(crit_list)){

      x_var <- enquo(x)
      y_var <- enquo(y)
      
      tmp = subset(Property, REPORT_TYPE==rpt_list[[i]] & CRITERIA_NO==crit_list[[j]])
  
  if (nrow(tmp) != 0) {
    blah<-ggplot(data=tmp, aes(x=!!x_var, y=!!y_var))  
            geom_bar(stat="identity")  
            facet_wrap(~DESCRIPTION)  
            ggtitle(expression('Properties by Qtr'))
    print(blah)
    }
    }
  }
}