Creating multiple plots within a loop and saving in R?-CodePudding

I am having trouble saving multiple plots from the output of a loop. To give some background:

I have multiple data frames, each with the data for single chemical toxicity for multiple species. I have labelled each data frame for the chemical that it represents, ie "ChemicalX". The data is in this format as this is how the "SSDTools" package works, which creates a species sensitivity distribution for a single chemical.

Because I have a lot of chemicals, I want to create a loop that iterates over each data frame, calculates the required metrics to create an SSD, plot the SSD, and then save the plot.

The code below works for calculating all of metrics and plotting the SSDs - it only breaks when I try to create a title within the loop, and when I try to save the plot within the loop

For reference, I am using the packages: SSDTools, ggplot2, tidyverse, fitdistrplus

My code is as follows:

# Create a list of data frames 
list_dfs <- list(ChemicalX, ChemicalY, ChemicalZ)

# make the loop
for (i in list_dfs){ # for each chemical (ie data frame)
  ssd.fits <- ssd_fit_dists(i, dists = c("llogis", "gamma", "lnorm", "gompertz", "lgumbel", "weibull", "burrIII3", "invpareto", "llogis_llogis", "lnorm_lnorm")) # Test the goodness of fit using all distributions available
  ssd.gof_fits <- ssd_gof(ssd.fits) # Save the goodness of fit statistics
  chosen_dist <- ssd.gof_fits %>% # Choose the best fit distribution by 
  filter(aicc==min(aicc)) # finding the minimum aicc
  final.fit <- ssd_fit_dists(i, dists = chosen_dist$dist) # Use the chosen distribution only
  final.predict <-predict(final.fit, ci = TRUE) # generate the final predictions
  plotdata <- i # create a separate plot data frame
  final.plot <-  ssd_plot(plotdata, final.predict, # generate the final plot
                        color = "Taxa",
                        label = "Species",
          xlab = "Concentration (ug/L)", ribbon = TRUE)   
  expand_limits(x = 10e6)   # to ensure the species labels fit
  ggtitle(paste("Species Sensitivity for",chem_names_df[i], sep = " "))  
  scale_colour_ssd()
  ggsave(filename = paste("SSD for",chem_names_df[i], ".pdf", sep = ""),
         plot = final.plot)
}

The code works great right up until the last part, where I want to create a title for each chemical in each iteration, and where I want to save the filename as the chemical name.

I have two issues:

I want the title of the plot to be "Species Sensitivity for ChemicalX", with ChemicalX being the name of the data frame. However, when I use the following code the title gets all messed up, and gives me a list of the species in that data frame (see image).
```
ggtitle(paste("Species Sensitivity.  for",i, sep = " "))
```
Graph title output using "i"

To try and get around this, I created a vector of chemical names that matches the order of the data frame list, called "chem_names_df". When I use ggtitle(paste("Species Sensitivity for",chem_names_df[i], sep = " ")) however, it gives me the error of Error in chem_names_df[i] : invalid subscript type 'list'
A similar issue is happening when I try to save the plot using GGSave. I am trying to save the filenames for each chemical data frame as "SSD_ChemicalX", except similarly to above it just outputs a list of the species in the place of i.

I think it has something to do with how R is calling from my list of dataframes - I am not sure why it is calling the species list (ie c("Danio Rerio, Lepomis macrochirus,...)) instead of the chemical name.

Any help would be appreciated! Thank you!

CodePudding user response：

Basically your problem here is that you are sometimes using i as if it is an index, and sometimes as if it is a data frame, but in fact it is a data frame.

Your example is not reproducible so let me provide one. You have done the equivalent of:

list_dfs2  <- list(mtcars, mtcars, cars)

for(i in list_dfs2){
    print(i)
}

This is just going to print the whole mtcars dataset twice and then the cars dataset. You can then define a vector:

cars_names  <- c("mtcars", "mtcars", "cars")

If you call cars_names[i], on the first iteration you're not calling cars_names[1], you're trying to subset a vector with an entire data frame. That won't work. Better to seq_along() your list of data frames and then subset it with list_dfs[[i]] when you want to refer to the actual data frame rather than the index, i. Something like:

# Create a list of data frames 
list_dfs <- list(ChemicalX, ChemicalY, ChemicalZ)

# make the loop
for (i in seq_along(list_dfs)){ # for each chemical (ie data frame)
  ssd.fits <- ssd_fit_dists(list_dfs[[i]], dists = c("llogis", "gamma", "lnorm", "gompertz", "lgumbel", "weibull", "burrIII3", "invpareto", "llogis_llogis", "lnorm_lnorm")) # Test the goodness of fit using all distributions available
  ssd.gof_fits <- ssd_gof(ssd.fits) # Save the goodness of fit statistics
  chosen_dist <- ssd.gof_fits %>% # Choose the best fit distribution by 
  filter(aicc==min(aicc)) # finding the minimum aicc
  final.fit <- ssd_fit_dists(list_dfs[[i]], dists = chosen_dist$dist) # Use the chosen distribution only
  final.predict <-predict(final.fit, ci = TRUE) # generate the final predictions
  plotdata <- list_dfs[[i]] # create a separate plot data frame
  final.plot <-  ssd_plot(plotdata, final.predict, # generate the final plot
                        color = "Taxa",
                        label = "Species",
          xlab = "Concentration (ug/L)", ribbon = TRUE)   
  expand_limits(x = 10e6)   # to ensure the species labels fit
  ggtitle(paste("Species Sensitivity for",chem_names_df[i], sep = " "))  
  scale_colour_ssd()
  ggsave(filename = paste("SSD for",chem_names_df[i], ".pdf", sep = ""),
         plot = final.plot)
}

CodePudding user response：

Consider using a defined method that receives name and data frame as input parameters. Then, pass a named list into the method using Map to iterate through data frames and corresponding names elementwise:

Function

build_plot <- function(plotdata, plotname) {
  # Test the goodness of fit using all distributions available 
  ssd.fits <- ssd_fit_dists(
     plotdata, 
     dists = c(
       "llogis", "gamma", "lnorm", "gompertz", "lgumbel", "weibull", 
       "burrIII3", "invpareto", "llogis_llogis", "lnorm_lnorm"
     )
  )
  # Save the goodness of fit statistics
  ssd.gof_fits <- ssd_gof(ssd.fits) 
  # Choose the best fit distribution by finding the minimum aicc
  chosen_dist <- filter(ssd.gof_fits, aicc==min(aicc)) 
  # Use the chosen distribution only
  final.fit <- ssd_fit_dists(plotdata, dists = chosen_dist$dist)  

  # generate the final predictions  
  final.predict <- predict(final.fit, ci = TRUE) 

  # generate the final plot
  final.plot <- ssd_plot(
      plotdata, final.predict, color = "Taxa", label = "Species",
      xlab = "Concentration (ug/L)", ribbon = TRUE)   
    expand_limits(x = 10e6)   # to ensure the species labels fit 
    ggtitle(paste("Species Sensitivity for", plotname))   
    scale_colour_ssd() 

  # export plot to pdf
  ggsave(filename = paste0("SSD for ", plotname, ".pdf"), plot = final.plot)

  # return plot to environment
  return(final.plot)
}

Call

# create a named list of data frames
chem_dfs <- list(
   "ChemicalX"=ChemicalX, "ChemicalY"=ChemicalY, "ChemicalZ"=ChemicalZ
)

chem_plots <- Map(build_plot, chem_dfs, names(chem_dfs))