I have over hundreds csv files that I would like to plot graph for each of them. I've searched through the forum and found something that I can use but still need some editing. The code is originally from Plotting multiple graphs from multiple .csv files using R.
library(dplyr)
list_of_dfs = lapply(list.files('path/to/files', pattern = '*csv'),
function(x) {
dat = read.csv(x)
dat$fname = x
return(dat)
})
one_big_df = list_of_dfs %>% bind_rows()
one_big_df %>% ggplot(aes(x = x, y = y)) geom_point() facet_wrap(~ fname)
It works fine except I need to save all the graphs separately.
I also need to analyse the graphs by overlapping the graphs according to the suffixes, is it possible to incorporate in the code?
Example file names:
MAX_C1-B3.csv
MAX_C2-B3.csv
MAX_C1-B4.csv
MAX_C2-B4.csv
...
So the ones with B3 should be in one graph and B4 another graphs.
Thanks for your help in advance!
CodePudding user response:
I am not sure that the following is what the question is asking for.
The main method is always the same,
- split the data with base function
split
. This creates a named list; - pipe the resulting list to
seq_along
to get index numbers into the list. This allows for access to the list'snames
attribute and to compose filenames according to them; - pipe the numbers to
purrr::map
and plot each list member separately; - save the results to disk.
First load the packages needed.
suppressPackageStartupMessages({
library(dplyr)
library(ggplot2)
library(purrr)
})
This is a common function to save the plots.
save_plot <- function(graph, graph_name, type = "") {
# file name depends on suffix and on directory structure
# the files are to be saved to a temp directory
# (it's just a code test)
if(type != "")
graph_name <- paste0(graph_name, "_", type)
filename <- paste0(graph_name, ".pdf")
filename <- file.path("~/Temp", filename)
ggsave(filename, graph, device = "pdf")
}
1. Plot all graphs separately
From the question:
It works fine except I need to save all the graphs separately.
Does this mean that the graphs corresponding to each file are to be saved separately? If yes, then the following code plots and saves them in files with filenames with the extension .csv
changed to .pdf
.
list_dfs_by_fname <- split(one_big_df, one_big_df$fname)
list_dfs_by_fname %>%
seq_along() %>%
map(.f = \(i) {
graph_name <- names(list_dfs_by_fname)[i]
DF <- list_dfs_by_fname[[i]]
graph <- DF %>%
ggplot(aes(x = x, y = y))
geom_point()
save_plot(graph, graph_name)
})
2. Plot by suffix
First create a new column with either the suffix "B3"
or the suffix "B4"
. Then split the data by groups so defined. The split data is needed for the two plots that follow.
inx <- grepl("B4$", one_big_df$fname)
one_big_df$group <- c("B3", "B4")[inx 1L]
list_dfs_by_suffix <- split(one_big_df, one_big_df$group)
2.1. Plot by suffix, overlapped
To have the groups of fname
overlap, map that variable to the color aesthetic.
list_dfs_by_suffix %>%
seq_along() %>%
map(.f = \(i) {
graph_name <- names(list_dfs_by_suffix)[i]
DF <- list_dfs_by_suffix[[i]]
graph <- DF %>%
ggplot(aes(x = x, y = y, color = fname))
geom_point()
save_plot(graph, graph_name, type = "overlapped")
})
2.2. Plot by suffix, faceted
If the plots are faceted by fname
, the code is copied and pasted from the question's with added scales = "free"
.
list_dfs_by_suffix %>%
seq_along() %>%
map(.f = \(i) {
graph_name <- names(list_dfs_by_suffix)[i]
DF <- list_dfs_by_suffix[[i]]
graph <- DF %>%
ggplot(aes(x = x, y = y))
geom_point()
facet_wrap( ~ fname, scales = "free")
save_plot(graph, graph_name, "faceted")
})
Test data
Use built-in data sets iris
and mtcars
to test the code.
Only the last two instructions matter to the question, they check the data set one_big_df
's column names and the values in fname
.
suppressPackageStartupMessages({
library(dplyr)
})
df1 <- iris[3:5]
df2 <- mtcars[c("hp", "qsec", "cyl")]
names(df1) <- c("x", "y", "categ")
names(df2) <- c("x", "y", "categ")
df2$categ <- factor(df2$categ)
sp1 <- split(df1[1:2], df1$categ)
sp2 <- split(df2[1:2], df2$categ)
names(sp1) <- sprintf("MAX_C%d-B3", seq_along(sp1))
names(sp2) <- sprintf("MAX_C%d-B4", seq_along(sp2))
list_of_dfs <- c(sp1, sp2)
list_of_dfs <- lapply(seq_along(list_of_dfs), \(i) {
list_of_dfs[[i]]$fname <- names(list_of_dfs)[i]
list_of_dfs[[i]]
})
one_big_df <- list_of_dfs %>% dplyr::bind_rows()
names(one_big_df)
#> [1] "x" "y" "fname"
unique(one_big_df$fname)
#> [1] "MAX_C1-B3" "MAX_C2-B3" "MAX_C3-B3" "MAX_C1-B4" "MAX_C2-B4" "MAX_C3-B4"
Created on 2022-05-31 by the reprex package (v2.0.1)