Home > Mobile >  Reading multiple data files and passing it into a function to plot
Reading multiple data files and passing it into a function to plot

Time:06-07

I have multiple files to plot as volcano plot. All my files in the folder.

Objective I would like to read them as list of files and then pass them into the function to plot for each data or files.

The function which I would like to use is this

EnhancedVolcano(res1,lab = rownames(res1),x = "log2FoldChange",y = "padj",
                #selectLab = c("APOBEC3B","CHD7","AURKB","EYA1","UHRF1","SFMBT1"),
                xlim = c(-8, 8),
                xlab = bquote(~Log[2]~ "fold change"),
              ylab = bquote(~-Log[10]~adjusted~italic(P)),
                transcriptPointSize = 10,
                transcriptLabSize = 10,
              border = "full",
              pCutoff = 0.05,
              #legendPosition = "bottom",
              borderWidth = 1.5,
              legend=c('NS','Log2 FC','Adjusted p-value',
                       'Adjusted p-value & Log2 FC'),
              legendPosition = 'bottom',
              legendLabSize = 20,
              legendIconSize = 20,
              borderColour = "blue",
              #drawConnectors = FALSE,
              #widthConnectors = 0.01,
              colConnectors = 'grey30',
              gridlines.major = FALSE,
              gridlines.minor = FALSE)

The is the list of files which I intend to use

M0_vs_M1_TCGA_stages.txt  M0_vs_M4_TCGA_stages.txt  M1_vs_M3_TCGA_stages.txt  M2_vs_M3_TCGA_stages.txt  M3_vs_M4_TCGA_stages.txt
M0_vs_M2_TCGA_stages.txt  M0_vs_M5_TCGA_stages.txt  M1_vs_M4_TCGA_stages.txt  M2_vs_M4_TCGA_stages.txt  M3_vs_M5_TCGA_stages.txt
M0_vs_M3_TCGA_stages.txt  M1_vs_M2_TCGA_stages.txt  M1_vs_M5_TCGA_stages.txt  M2_vs_M5_TCGA_stages.txt  M4_vs_M5_TCGA_stages.txt

The general structure of each of my dataframe is like this

a <- dput(head(M0_vs_M1_TCGA_stages))
structure(list(gene = c("ENSG00000000003", "ENSG00000000971", 
"ENSG00000002726", "ENSG00000003989", "ENSG00000005381", "ENSG00000006534"
), Symbol = c("TSPAN6", "CFH", "AOC1", "SLC7A2", "MPO", "ALDH3B1"
), baseMean = c(18.692748982067, 464.265236194545, 109.22179823167, 
85.504528879087, 225281.306485184, 3135.38237206618), log2FoldChange = c(1.72011856334064, 
-1.84102137729838, -1.90294968540377, -2.38723703218791, -4.71693379158602, 
-1.50626419101949), lfcSE = c(0.521825206121688, 0.528072294508922, 
0.539428712863011, 0.661673608593429, 0.523148071429431, 0.26205630469554
), stat = c(3.29635008650678, -3.48630556164743, -3.52771300456717, 
-3.60787705778782, -9.0164411362497, -5.74786472994606), pvalue = c(0.00097949874464195, 
0.00048974125782849, 0.00041916635977159, 0.00030871270363637, 
1.94298755192739e-19, 9.03774951656819e-09), padj = c(0.0133044251543343, 
0.00833058768185816, 0.00750903801425802, 0.00609902023132708, 
3.7330619835181e-15, 3.94641548776874e-06), UP_DOWN = c("UP", 
"DOWN", "DOWN", "DOWN", "Low", "DOWN")), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))

So for each file or each dataset I would like to pass them to the above function and print them as individual plot and retain the name of the in the plot except.

Any suggestion or help I would really appreciate.

My attempt so far

 make_volcano <- function(df){
      ggmaplot(df, main = expression("Group 1" %->% "Group 2"),
               fdr = 0.05, fc = 1, size = 0.4,
               palette = c("#B31B21", "#1465AC", "darkgray"),
               genenames = as.vector(df$Symbol),
               legend = "top", top = 0,
               font.label = c("bold", 11),
               font.legend = "bold",
               font.main = "bold",
               ggtheme = ggplot2::theme_minimal())
    }
    
    plots <- lapply(all_csv, make_volcano)

This does what i need it was not so complicate i need to figure out how to save the plot with respective file name

Improved version of my answer

bb <- all_csv


plot_list = list()
for (i in seq(length(bb))) {
  p = make_volcano(bb[[i]])
  plot_list[[i]] = p
}


pdf("MAPLOT.pdf",height = 10,width = 15)

for (i in seq(length(bb))) {
  print(plot_list[[i]])
}
dev.off()

Only thing I need to add put each list element name into the plot in order to identify although they are being plotted in order

CodePudding user response:

I am not on my computer and don't have R available, thus this answer is more general and should just give an idea of the principle.

You seem to have solved the problem to read in the list of files and already have the list of data sets. And you have your plotting function. Well done.

I personally prefer the "apply" family for looping, because it is slightly shorter code, I find it easier to read, and also comes with less (i.e., no) danger of "growing your vectors". (see also Burn's famous R inferno, chapter 2).

in your case, you could therefore simply write

## lapply returns a list
lapply(all_csv, make_volcano)

Which will create the list of plots. You have now several options to save them. You could print them on one plot, easiest with the patchwork package:

plots <- lapply(all_csv, make_volcano)
patchwork::wrap_plots(plots)

If you want to create separate files, your approach is perfectly fine. Another option might be to use ggsave, here again with lapply. You can specify arguments in lapply itself.

lapply(plots, ggsave, width = 15, device = "pdf")

Naming is a bit trickier and certainly depends largely on the structure of your data set list. Is it a named list? What do you get when calling names(all_csv)?

You can use the names for the titles, as shown in this thread. This is also not the only thread on that topic, it is actually a farily common problem here on stackoverflow. The general idea is to loop over both list and names and assign the respective name to the plot - this can be achieved via indexing or with the use of parallel looping functions such as mapply or purrr::map2. I generally like looping over indexes for those cases. You could for example do:

lapply(1:length(all_csv), function(i){
make_volcano(all_csv[[i]]  
## I am here assuming that ggmaplot returns a ggplot object to which you can add a
## ggtitle layer - not sure if this really works. But hopefully you get the idea
ggtitle(names(all_csv)[i])
})

The same idea of looping over indexes of your names should work with ggsave, and you will get filenames that are like the read-in data files.

lapply(1: length(plots), function(i){
ggsave(plot = plots[[i]], 
       filename = paste(names(plots)[i], ".pdf"), 
        width = 15)
})
  • Related