I am trying to run a nextflow pipeline but the output file is not created.
The main.nf file looks like this:
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process my_script {
"""
Rscript script.R
"""
}
workflow {
my_script
}
In my nextflow.config I have:
process {
executor = 'k8s'
container = 'rocker/r-ver:4.1.3'
}
The script.R looks like this:
FUN <- readRDS("function.rds");
input = readRDS("input.rds");
output = FUN(
singleCell_data_input = input[[1]], savePath = input[[2]], tmpDirGC = input[[3]]
);
saveRDS(output, "output.rds")
After running nextflow run main.nf the output.rds is not created
CodePudding user response:
Nextflow processes are run independently and isolated from each other from inside the working directory. For your script to be able to find the required input files, these must be localized inside the process working directory. This should be done by defining an input
block and declaring the files using the path
qualifier, for example:
params.function_rds = './function.rds'
params.input_rds = './input.rds'
process my_script {
input:
path my_function_rds
path my_input_rds
output:
path "output.rds"
"""
#!/usr/bin/env Rscript
FUN <- readRDS("${my_function_rds}");
input = readRDS("${my_input_rds}");
output = FUN(
singleCell_data_input=input[[1]], savePath=input[[2]], tmpDirGC=input[[3]]
);
saveRDS(output, "output.rds")
"""
}
workflow {
function_rds = file( params.function_rds )
input_rds = file( params.input_rds )
my_script( function_rds, input_rds )
my_script.out.view()
}
In the same way, the script itself would need to be localized inside the process working directory. To avoid specifying an absolute path to your R script (which would not make your workflow portable at all), it's possible to simply embed your code, making sure to specify the Rscript shebang. This works because process scripts are not limited to Bash1.
Another way, would be to make your Rscript executable and move it into a directory called bin
in the the root directory of your project repository (i.e. the same directory as your 'main.nf' Nextflow script). Nextflow automatically adds this folder to the $PATH
environment variable and your script would become automatically accessible to each of your pipeline processes. For this to work, you'd need some way to pass in the input files as command line arguments. For example:
params.function_rds = './function.rds'
params.input_rds = './input.rds'
process my_script {
input:
path my_function_rds
path my_input_rds
output:
path "output.rds"
"""
script.R "${my_function_rds}" "${my_input_rds}" output.rds
"""
}
workflow {
function_rds = file( params.function_rds )
input_rds = file( params.input_rds )
my_script( function_rds, input_rds )
my_script.out.view()
}
And your R script might look like:
#!/usr/bin/env Rscript
args <- commandArgs(trailingOnly = TRUE)
FUN <- readRDS(args[1]);
input = readRDS(args[2]);
output = FUN(
singleCell_data_input=input[[1]], savePath=input[[2]], tmpDirGC=input[[3]]
);
saveRDS(output, args[3])