Home > Mobile >  Plot multiple graphs from multiple csv and follow up analysis
Plot multiple graphs from multiple csv and follow up analysis

Time:06-07

I have over hundreds csv files that I would like to plot graph for each of them. I've searched through the forum and found something that I can use but still need some editing. The code is originally from Plotting multiple graphs from multiple .csv files using R.

library(dplyr)
list_of_dfs = lapply(list.files('path/to/files', pattern = '*csv'), 
    function(x) {
        dat = read.csv(x)
        dat$fname = x
        return(dat)
    })
one_big_df = list_of_dfs %>% bind_rows()
one_big_df %>% ggplot(aes(x = x, y = y))   geom_point()   facet_wrap(~ fname)

It works fine except I need to save all the graphs separately.

I also need to analyse the graphs by overlapping the graphs according to the suffixes, is it possible to incorporate in the code?

Example file names:

MAX_C1-B3.csv
MAX_C2-B3.csv
MAX_C1-B4.csv
MAX_C2-B4.csv
...

So the ones with B3 should be in one graph and B4 another graphs.

Thanks for your help in advance!

CodePudding user response:

I am not sure that the following is what the question is asking for.
The main method is always the same,

  1. split the data with base function split. This creates a named list;
  2. pipe the resulting list to seq_along to get index numbers into the list. This allows for access to the list's names attribute and to compose filenames according to them;
  3. pipe the numbers to purrr::map and plot each list member separately;
  4. save the results to disk.

First load the packages needed.

suppressPackageStartupMessages({
  library(dplyr)
  library(ggplot2)
  library(purrr)
})

This is a common function to save the plots.

save_plot <- function(graph, graph_name, type = "") {
  # file name depends on suffix and on directory structure
  # the files are to be saved to a temp directory
  # (it's just a code test)
  if(type != "") 
    graph_name <- paste0(graph_name, "_", type)
  filename <- paste0(graph_name, ".pdf")
  filename <- file.path("~/Temp", filename)
  ggsave(filename, graph, device = "pdf")
}

1. Plot all graphs separately

From the question:

It works fine except I need to save all the graphs separately.

Does this mean that the graphs corresponding to each file are to be saved separately? If yes, then the following code plots and saves them in files with filenames with the extension .csv changed to .pdf.

list_dfs_by_fname <- split(one_big_df, one_big_df$fname)
list_dfs_by_fname %>%
  seq_along() %>% 
  map(.f = \(i) {
    graph_name <- names(list_dfs_by_fname)[i]
    DF <- list_dfs_by_fname[[i]]
    
    graph <- DF %>% 
      ggplot(aes(x = x, y = y))   
      geom_point()
    
    save_plot(graph, graph_name)
  })

2. Plot by suffix

First create a new column with either the suffix "B3" or the suffix "B4". Then split the data by groups so defined. The split data is needed for the two plots that follow.

inx <- grepl("B4$", one_big_df$fname)
one_big_df$group <- c("B3", "B4")[inx   1L]
list_dfs_by_suffix <- split(one_big_df, one_big_df$group)

2.1. Plot by suffix, overlapped

To have the groups of fname overlap, map that variable to the color aesthetic.

list_dfs_by_suffix %>% 
  seq_along() %>% 
  map(.f = \(i) {
    graph_name <- names(list_dfs_by_suffix)[i]
    DF <- list_dfs_by_suffix[[i]]
    
    graph <- DF %>% 
      ggplot(aes(x = x, y = y, color = fname))   
      geom_point()
    
    save_plot(graph, graph_name, type = "overlapped")
  })

2.2. Plot by suffix, faceted

If the plots are faceted by fname, the code is copied and pasted from the question's with added scales = "free".

list_dfs_by_suffix %>% 
  seq_along() %>% 
  map(.f = \(i) {
    graph_name <- names(list_dfs_by_suffix)[i]
    DF <- list_dfs_by_suffix[[i]]
    
    graph <- DF %>% 
      ggplot(aes(x = x, y = y))   
      geom_point()  
      facet_wrap( ~ fname, scales = "free")
    
    save_plot(graph, graph_name, "faceted")
  })

Test data

Use built-in data sets iris and mtcars to test the code.
Only the last two instructions matter to the question, they check the data set one_big_df's column names and the values in fname.

suppressPackageStartupMessages({
  library(dplyr)
})

df1 <- iris[3:5]
df2 <- mtcars[c("hp", "qsec", "cyl")]
names(df1) <- c("x", "y", "categ")
names(df2) <- c("x", "y", "categ")
df2$categ <- factor(df2$categ)
sp1 <- split(df1[1:2], df1$categ)
sp2 <- split(df2[1:2], df2$categ)
names(sp1) <- sprintf("MAX_C%d-B3", seq_along(sp1))
names(sp2) <- sprintf("MAX_C%d-B4", seq_along(sp2))

list_of_dfs <- c(sp1, sp2)
list_of_dfs <- lapply(seq_along(list_of_dfs), \(i) {
  list_of_dfs[[i]]$fname <- names(list_of_dfs)[i]
  list_of_dfs[[i]]
})
one_big_df <- list_of_dfs %>% dplyr::bind_rows()
names(one_big_df)
#> [1] "x"     "y"     "fname"
unique(one_big_df$fname)
#> [1] "MAX_C1-B3" "MAX_C2-B3" "MAX_C3-B3" "MAX_C1-B4" "MAX_C2-B4" "MAX_C3-B4"

Created on 2022-05-31 by the reprex package (v2.0.1)

  • Related