Home > Software engineering >  Translating variables from bash to R
Translating variables from bash to R

Time:09-22

I need to use exomedepth. This requires an Rscript.

However I have been running this bash script previously (it goes into the bestcoverage_E036 file which contains the list of file name IDs, and retrieves the ID corresponding the job array and the line number) - it works great for bash scripts.

#!/bin/bash --login
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH -p htc
#SBATCH --mail-type=ALL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --array=1-64

module load parallel
module load tool

EXOME_IDs_FILE=/home/bestcoverage_E036
INPUTFILE=/home/{}.bam

sed -n "${SLURM_ARRAY_TASK_ID}p" $EXOME_IDs_FILE | parallel -j 1 "tool $INPUTFILE"

However, I now need to use R for exomedepth, the documentation shows some of its use as :

data(exons.hg19)
my.counts <- getBamCounts(bed.frame = exons.hg19,
                          bam.files = my.bam,
                          include.chr = FALSE,
                          referenceFasta = fasta)

I would like to use my variables from bash in these examples, such as so my.bam would be the $INPUTFILE

this obviously doesn't work but the idea is something like this:

#!/bin/bash --login
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH -p htc
#SBATCH --mail-type=ALL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --array=1-64

module load parallel
module load tool

EXOME_IDs_FILE=/home/bestcoverage_E036
INPUTFILE=/home/{}.bam
HG38=/home/hg38.fasta
INPUTBEDFILE=/home/inputbed.bed

sed -n "${SLURM_ARRAY_TASK_ID}p" $EXOME_IDs_FILE | parallel -j 1 "data($INPUTBEDFILE)
                                                                 my.counts <- getBamCounts(bed.frame = $INPUTBEDFILE,
                                                                 bam.files = $INPUTFILE,
                                                                 include.chr = FALSE,
                                                                 referenceFasta = $HG38)

Does anyone know how to use bash variables in R code?

CodePudding user response:

You need a way to pass command line arguments to your Rscript there are a couple of libraries helping you with that and a very rudimentary base R function (commandArgs).

With the latter you have to do a lot of parsing and sense checking yourself, while libraries like getopt help you a lot with common tasks.

Having said that here's an example using base R:

cli.R

## base R very simple but a lot of manual parsing

base_args <- commandArgs(TRUE)

run_rnorm <- function(n, mean = NA, sd = NA) {
   `%!%` <- function(x, y) if (is.na(as.numeric(x))) y else as.numeric(x)
   args <- list(n = NULL,
                mean = NULL,
                sd = NULL)
   args$n <- as.numeric(n)
   args$mean <- mean %!% NULL
   args$sd <- sd %!% NULL
   do.call(rnorm, args)
}

stopifnot(`At least one parameter is needed` = length(base_args) > 0)
run_rnorm(base_args[1], base_args[2], base_args[3])

The you can call it from the bash like this:

Rscript cli.R 3

Thus, you have now a possibility to pass (bash) variables from a script like this

Rscript cli.r $myvariable

and in cli.R you can access it via commandArgs(TRUE)[1]. I do not know about parallel, so you have check how to puzzle this together.

  • Related