I have a directory of sorted bam files that I want to use pileup
function to. The output of pileup
function is a dataframe. Then I would like to use the result of each file and form a dataframe.
For each file, I use the follow codes:
r16<-pileup(filename, index=filename, scanBamParam = ScanBamParam(), pileupParam = PileupParam())
r16$sample_id <- "sample id"
For sample_id
column, I would like it to be the name of the file, for example:
- the file name is
file1.sorted.bam
, I would likesample_id
to befile1
And after all files are processed, I would use rbind
to get a big dataframe and save it to a RData file.
So far, I have tried to use the loops on them, but it is not giving me any outputs.
library(pasillaBamSubset)
library(Rsamtools)
filenames<-Sys.glob("*.sorted.bam")
for (file in filenames) {
output <- pileup(pileup(filenames, index=filenames, scanBamParam = ScanBamParam(), pileupParam = PileupParam()))
save(output, file = "res.RData")
}
CodePudding user response:
I am assuming that you want to stack all the data.frames on top of each other (row bind). map
(from purrr) or lapply
can apply a function to each item in
a given list/vector (each filename in this case). map_dfr
does the same and row binds all the outputs.
filenames <- list.files(pattern = "*.sorted.bam")
library(purrr)
purrr::map_dfr(filenames, ~pileup(.x,
index = .x,
scanBamParam = ScanBamParam(),
pileupParam = PileupParam()))