Loop through files and use functions, then use that result to form a dataframe in r-CodePudding

I have a directory of sorted bam files that I want to use pileup function to. The output of pileup function is a dataframe. Then I would like to use the result of each file and form a dataframe.

For each file, I use the follow codes:

r16<-pileup(filename, index=filename, scanBamParam = ScanBamParam(), pileupParam = PileupParam())
r16$sample_id <- "sample id"

For sample_id column, I would like it to be the name of the file, for example:

the file name is file1.sorted.bam, I would like sample_id to be file1

And after all files are processed, I would use rbind to get a big dataframe and save it to a RData file.

So far, I have tried to use the loops on them, but it is not giving me any outputs.

library(pasillaBamSubset)
library(Rsamtools)
filenames<-Sys.glob("*.sorted.bam")
for (file in filenames) {
  output <- pileup(pileup(filenames, index=filenames, scanBamParam = ScanBamParam(), pileupParam = PileupParam()))
  save(output, file = "res.RData")
}

CodePudding user response：

I am assuming that you want to stack all the data.frames on top of each other (row bind). map (from purrr) or lapply can apply a function to each item in a given list/vector (each filename in this case). map_dfr does the same and row binds all the outputs.

filenames <- list.files(pattern = "*.sorted.bam")

library(purrr)
purrr::map_dfr(filenames, ~pileup(.x, 
                                  index = .x,
                                  scanBamParam = ScanBamParam(),
                                  pileupParam = PileupParam()))