Home > OS >  Using Purrr to filter a dataset and plot a series of graphs using ggplot (in RMarkdown, for Shiny)
Using Purrr to filter a dataset and plot a series of graphs using ggplot (in RMarkdown, for Shiny)

Time:02-26

Background: I want to make a Shiny app that colleagues, who don't use R and don't have it installed, can upload .csv files to, and then download a report generated by the app, using their web browser. The report will be an editable word file which will include graphs plotted for each different group in the dataset provided (denoted by an identifier in a particular column).

Colleagues will upload one or more .csv files which will contain multiple datasets with the same column headings. These will ideally get combined into a single dataframe using rbind or similar.

The Shiny app will then initialise the RMarkdown report, which will identify all the distinct identifiers in a particular column, which I want to use in a purrr/map command to filter the dataset by each identifier, and plot a graph in ggplot for each. There should also be a title above each graph, and a sentence of text below each graph, which will also need to be produced by iterative functions.

I've put what I have so far below (and have tried various other permutations and commands along the way). I managed to get something like the below function to plot only the graphs, but want to amend it so that three functions (write header, plot graph, write sentence) are executed at once for each item in the list before the script moves to the next item, and am struggling with this. I also couldn't get that function to work with the annotate and xlim commands at the end of the ggplot sequence, but it would be really helpful if those worked, too.

Any help or advice would be greatly appreciated. Thank you so much.


  plot <- function(x) {
    report.data %>% 
      filter(identifier == .data[[.x]]) %>% 
      ggplot(aes(x = sample.no, y = result))  
      geom_point(aes(colour = analyser))  
      geom_hline(aes(yintercept = mean(result)   2 * sd(result)), colour = "red", linetype = "dashed")  
      geom_hline(aes(yintercept = mean(result) - 2 * sd(result)), colour = "red", linetype = "dashed")  
      xlab("Sample number")  
      ylab("Result")  
      theme_classic() # 
      #annotate("text", x = max(sample.no)   2, y = mean(result), size = 3.5)  
      #xlim(0, max(sample.no)   2)
  }   
  
funs <- c(
  header, 
  plot, 
  text
)

args <- list(unique(report.data$identifier))
report.data %>% map_df(~funs %>% map(exec, .x, !!!args))

Code to generate example input data:

library(dplyr)
set.seed(1234)

test1.level1.analyser1 <- data.frame(
  result = rnorm(25, mean = 2.5, sd = 0.2), 
  test = c("test1"), 
  level = c("level1"), 
  sample.no = c(1:25), 
  analyser = c("analyser1")
  )

test1.level1.analyser2 <- data.frame(
  result = rnorm(25, mean = 2.6, sd = 0.1), 
  test = c("test1"), 
  level = c("level1"), 
  sample.no = c(1:25), 
  analyser = c("analyser2")
  )

test1 <- rbind(test1.level1.analyser1, test1.level1.analyser2)

test2.level1.analyser1 <- data.frame(
  result = rnorm(25, mean = 10, sd = 2), 
  test = c("test2"), 
  level = c("level1"), 
  sample.no = c(1:25), 
  analyser = c("analyser1")
  )

test2.level1.analyser2 <- data.frame(
  result = rnorm(25, mean = 9.5, sd = 0.75), 
  test = c("test2"), 
  level = c("level1"), 
  sample.no = c(1:25), 
  analyser = c("analyser2"))

test2.level2.analyser1 <- data.frame(
  result = rnorm(25, mean = 30, sd = 1.8), 
  test = c("test2"), 
  level = c("level2"), 
  sample.no = c(1:25), 
  analyser = c("analyser1")
  )

test2.level2.analyser2 <- data.frame(
  result = rnorm(25, mean = 9.5, sd = 0.75), 
  test = c("test2"), 
  level = c("level2"), 
  sample.no = c(1:25), 
  analyser = c("analyser2"))
test2.level1 <- rbind(test2.level1.analyser1, test2.level1.analyser2)

test2 <- rbind(test2.level1.analyser1, test2.level1.analyser2, test2.level2.analyser1, test2.level2.analyser2)

input.data <- rbind(test1, test2) %>% mutate(identifier = paste(test, level, sep = " ")) 

CodePudding user response:

Would it be an option to integrate the individual titles above and sentences below the graph directly into ggplot? In that case I would suggest to modify the plot function like this:

library(ggplot2)
library(dplyr)
library(purrr)

my_plot <- function(df) {
    ggplot(df, aes(x = sample.no, y = result))  
    geom_point(aes(colour = analyser))  
    geom_hline(aes(yintercept = mean(result)   2 * sd(result)), colour = "red", linetype = "dashed")  
    geom_hline(aes(yintercept = mean(result) - 2 * sd(result)), colour = "red", linetype = "dashed")  
    theme_classic()  
    labs(
      # the title above the plot, based on information in the filtered df
      title = paste0("some title for identifier:  ", unique(df$identifier)),
      x = "Sample number",
      y = "Result",
      # the text below, based on data in the filtered data frame
      caption = paste0("A short sentence about the mean result (", round(mean(df$result), 2), ") below the plot.")
    )  
    coord_cartesian(xlim = c(0, max(df$sample.no)   2))      
    theme(
      # configure the caption / sentence below
      plot.caption=element_text(size=12, hjust = 0, margin = margin(t=20)),
      # add some buffer at bottom as spacing between plots
      plot.margin = margin(b=50)
    )
}

plot_list <- purrr::map(unique(input.data$identifier),
                           function(x) {
                             # filter data before passing it to the plot function
                             input.data %>% 
                               dplyr::filter(identifier == x) %>%
                               my_plot()
                           }
                       )

which produces a list of plots, which can be then be printed in a Rmd chunk like this.

```{r}
purrr::map(plot_list, ~plot(.x))

enter image description here

  • Related