I need to run a repetitive code on multiple data sets. I like to do this in R Markdown files because the drop-down headers make it easier to organize and navigate my code. I rarely knit these files but instead run specific code chunks.
Some variables are the same across datasets: packages to load, a custom function, master csv file, etc. I prefer to include these common elements in a separate code chunk at the top of the rmd file. This facilitates simple modifications if needed, instead of needing to modify the same code within multiple chunks.
In my example below, when I run the Dataset 1 code chunk, I want it to first run the three chunks under the #Setup header and then run the Dataset 1 Chunk. Dataset 2 Chunk is not run.
Similarly, when I run Dataset 2 Chunk, I want it to first run #Setup chunks followed by Dataset 2 Chunk. Dataset 1 is not run.
# Setup
{r Setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,tidy.opts=list(width.cutoff=90),tidy=TRUE)
{r Packages, message=FALSE, warning=FALSE}
rm(list = ls()); invisible(gc()) #clear workspace and perform garbage collection to free up memory.
suppressPackageStartupMessages(
{ library(tidyverse)
library(readxl)
library(ggplot2)
library(rtracklayer)
library(trackViewer)
library(ggplot2)
}
)
# Specific Analyses
## Dataset 1
{r Dataset 1 Code, message = FALSE}
dataset1 <- read_excel("~/Desktop/Dataset1.xlsx, col_name=TRUE)
## Dataset 2
{r Dataset 2 Code, message = FALSE}
dataset2 <- read_excel("~/Desktop/Dataset2.xlsx, col_name=TRUE)
CodePudding user response:
I would do it by putting the setup code in a function and calling that function at the start of each analysis chunk. If setup is slow, you could add a check to it so it only runs once per session (but then watch out if you change the setup data).
For example, replace your Setup and Packages chunk with this:
```{r SetupFunction, include=FALSE}
Setup <- function() {
knitr::opts_chunk$set(echo = TRUE,
tidy.opts=list(width.cutoff=90),
tidy=TRUE)
#clear workspace and perform garbage collection to free up memory, but keep this function
removals <- ls(globalenv())
removals <- removals[removals != "Setup"]
rm(list = removals, pos = globalenv())
gc()
# Make sure packages are loaded
suppressPackageStartupMessages(
{ library(tidyverse)
library(readxl)
library(ggplot2)
library(rtracklayer)
library(trackViewer)
library(ggplot2)
}
)
# Define a function. Use `<<-` so it is available globally
newfn <<- function(...) {
print("this is newfn")
}
}
```
Then at the start of each analysis chunk, just call Setup()
.
The main weakness I see is that the search()
list isn't being cleaned up, so if any of your analysis chunks attach variables or packages, it will keep those in other chunks as well. You could fix this by saving the search()
value and using detach()
later to clean it up, but it's probably not needed. You shouldn't ever use attach()
, and to be consistent with the way you have this document set up, you should be putting all your library()
calls in the Setup()
function.