I need to use exomedepth. This requires an Rscript.
However I have been running this bash script previously (it goes into the bestcoverage_E036 file which contains the list of file name IDs, and retrieves the ID corresponding the job array and the line number) - it works great for bash scripts.
#!/bin/bash --login
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH -p htc
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --array=1-64
module load parallel
module load tool
EXOME_IDs_FILE=/home/bestcoverage_E036
INPUTFILE=/home/{}.bam
sed -n "${SLURM_ARRAY_TASK_ID}p" $EXOME_IDs_FILE | parallel -j 1 "tool $INPUTFILE"
However, I now need to use R for exomedepth, the documentation shows some of its use as :
data(exons.hg19)
my.counts <- getBamCounts(bed.frame = exons.hg19,
bam.files = my.bam,
include.chr = FALSE,
referenceFasta = fasta)
I would like to use my variables from bash in these examples, such as so my.bam would be the $INPUTFILE
this obviously doesn't work but the idea is something like this:
#!/bin/bash --login
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH -p htc
#SBATCH --mail-type=ALL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --array=1-64
module load parallel
module load tool
EXOME_IDs_FILE=/home/bestcoverage_E036
INPUTFILE=/home/{}.bam
HG38=/home/hg38.fasta
INPUTBEDFILE=/home/inputbed.bed
sed -n "${SLURM_ARRAY_TASK_ID}p" $EXOME_IDs_FILE | parallel -j 1 "data($INPUTBEDFILE)
my.counts <- getBamCounts(bed.frame = $INPUTBEDFILE,
bam.files = $INPUTFILE,
include.chr = FALSE,
referenceFasta = $HG38)
Does anyone know how to use bash variables in R code?
CodePudding user response:
You need a way to pass command line arguments to your Rscript
there are a couple of libraries helping you with that and a very rudimentary base R
function (commandArgs
).
With the latter you have to do a lot of parsing and sense checking yourself, while libraries like getopt
help you a lot with common tasks.
Having said that here's an example using base R
:
cli.R
## base R very simple but a lot of manual parsing
base_args <- commandArgs(TRUE)
run_rnorm <- function(n, mean = NA, sd = NA) {
`%!%` <- function(x, y) if (is.na(as.numeric(x))) y else as.numeric(x)
args <- list(n = NULL,
mean = NULL,
sd = NULL)
args$n <- as.numeric(n)
args$mean <- mean %!% NULL
args$sd <- sd %!% NULL
do.call(rnorm, args)
}
stopifnot(`At least one parameter is needed` = length(base_args) > 0)
run_rnorm(base_args[1], base_args[2], base_args[3])
The you can call it from the bash
like this:
Rscript cli.R 3
Thus, you have now a possibility to pass (bash) variables from a script like this
Rscript cli.r $myvariable
and in cli.R
you can access it via commandArgs(TRUE)[1]
. I do not know about parallel
, so you have check how to puzzle this together.