Home > database >  R function to generate and save RMarkdown pdf and iterate over multiple CSVs in folder
R function to generate and save RMarkdown pdf and iterate over multiple CSVs in folder

Time:09-10

I have many CSV files with time series data from environmental sensors. All of them have columns with the same names/order, and they look like this:

# create time series columns
datetime <- as.POSIXct(c("2022-01-14 17:00:00 UTC", "2022-01-14 17:15:00 UTC", "2022-01-14 17:30:00 UTC", "2022-01-14 17:45:00 UTC", "2022-01-14 18:00:00 UTC"))
siteID <- rep("04M09_2", 10)
tempC <- c(6.9783360, 6.5733036, 5.3476500, 4.1025504, 3.2613720, 
           2.4101928, 1.6562436, 1.2212088, 1.0028580, 0.8928492)
SpC <- rep(0, 10)
wetdry <- rep("dry", 10)
lat <- rep(39.07982, 10)
long <- rep(-96.5816, 10)
field_SpC <- c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 20)

# make data frame
sensor_04M09 <- data.frame(datetime, siteID, tempC, SpC, wetdry, lat, long, field_SpC)

I would like to write an R function that I could iterate over an entire folder of CSV data from these sensors (one CSV file per sensor) to produce and save a pdf of this document for each sensor. Here is what I want the Markdown to look like. (Note: the csv that I show as being read in at first is like the one I created for this example above)

---
title: "04M09_2 STIC Summary"
author: "Me"
date: '2022-09-08'
output: pdf_document
---
knitr::opts_chunk$set(echo = TRUE)

Bring in processed STIC data frame

library(tidyverse)
sensor_04M09 <- read_csv("sensor_04M09.csv")

head(sensor_04M09)

Time series of SpC colored by wet/dry designation (red dot represents field SpC measurement)

ggplot(Sensor_04M09, aes(x = datetime, y = SpC, color = wetdry, group = 1))   
  geom_path(size = 0.7)   
  geom_point(aes(x = datetime, y = field_SpC), size = 3, color = "red")  
   theme_bw()   
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_rect(colour = "black", size = 1))   
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14))

Time series of Temperature (C) recorded by sensor

ggplot(Sensor_04M09, aes(x = datetime, y = tempC))   
  geom_path()   
  geom_smooth(color = "steelblue", se = FALSE)  
   theme_bw()   
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_rect(colour = "black", size = 1))   
  theme(axis.text = element_text(size = 12),
        axis.title = element_text(size = 14))

Map of Sensor location

library(Rcpp)
library(sp)
library(raster)
library(rgdal)
library(rasterVis)
library(sf)

# Bring in stream line shape files
konza_streams <- st_read("GIS210/GIS210.shp")

sensor_location <- st_as_sf(STIC_KNZ_04M09_00_LS,
                                coords = c("long", "lat"), 
                                crs = 4326)

ggplot()   
  geom_sf(data = konza_streams)   
  geom_sf(data = sensor_location, size = 3, color = "red")  
  theme_bw()   
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_rect(colour = "black", size = 1))   
  theme(axis.text = element_text(size = 9),
        axis.title = element_text(size = 12))  
  xlab("Longitude")   
  ylab("Latitude")   
  coord_sf(xlim = c(708000.9  , 710500.3 ), ylim = c(4327200.8  , 4330000.0 ), expand = FALSE)

The purpose of creating and saving these pdf markdown docs for each CSV file in the folder is for a visual QAQC check of the data from each sensor.

CodePudding user response:

RMarkdown documents allow parameters, which you can set while rendering the document to change some aspects of how it runs. See this page for additional detail on using parameters.

In your RMarkdown document, set up your YAML to accept the parameter:

---
title: "04M09_2 STIC Summary"
author: "Me"
date: '2022-09-08'
output: pdf_document
params:
    datafile: "sensor.csv"
---

Then, in the code, use that parameter to select the data file of interest, and run all your calculations/graphs:

library(tidyverse)
sensor_04M09 <- read_csv(params$datafile)

# And then put all your graphing code, etc., using the sensor_04M09 dataset
# (or name it something more generic)

Now you have an RMarkdown doc that can be given a filename and will produce your range of graphs using the data from that file. Save that as analysis_file.rmd or something.

Finally, produce a script to loop over all the files you want it to run through. In a separate .R script:

library(rmarkdown)
library(stringr)

# directory is the folder with all your data files in it
list_of_files = list.files('directory', '.csv')
for (f in list_of_files) {
    # Get the names of the sensors alone, for making filenames
    outputname = paste0('analysis_of_',str_sub(f, 1, nchar(f) - 4),'.pdf')
    # get the full filename of the data file
    full_file = paste0('directory/',f)
    render("analysis_file.rmd", output_file = outputname, params = list(datafile = full_file))
}

This will loop through all the files and render the document once for each.

  • Related