Home > Blockchain >  Using docker for only some processes in Nextflow
Using docker for only some processes in Nextflow

Time:02-02

I am writing a pipeline in Nextflow, which contains multiple processes, where most of them use docker. Now I am trying to add a new process which includes only a python script to preprocess some results - no docker image needed.

However, I get the error Missing container image for process 'my_python_process'.

I define the docker images in nextflow.config as follows:

process {
    withName:process1 {
        container = 'some/image1:1.0'
    }
    withName:process2{
        container = 'some/image2:1.0'
    }
    withName:process3{
        container = 'some/image3:1.0'
    }
}

docker {
    enabled = true
}

I found a discussion, where they suggested using container = null for the process without container, but it still gives the same error, no matter what the process script contains.

Does anyone know what I'm missing please? Thank you!

CodePudding user response:

With docker.enabled = true, Nextflow will try to run each process in a Docker container created using the specified image. You then get the error you're seeing when the container directive has not been specified for a particular process. The usual way is to just specify a 'base' or 'default' container for your workflow. You may want to choose one that comes with Python. Otherwise, Ubuntu would be a good choice in my opinion.

Note that the withName process selector has the highest priority1.

process {

    container = 'ubuntu:22.04'

    withName: my_python_process {
        container = 'python:3.9'
    }

    withName: process1 {
        container = 'some/image1:1.0'
    }
    withName: process2 {
        container = 'some/image2:1.0'
    }
    withName: process3 {
        container = 'some/image3:1.0'
    }
}

docker {
    enabled = true
}

I'm not aware of a way to disable Docker execution for a particular process, but nor would you really want to2. The above approach should be preferred:

Containerization allows you to write self-contained and truly reproducible computational pipelines, by packaging the binary dependencies of a script into a standard and portable format that can be executed on any platform that supports a container runtime. Furthermore, the same pipeline can be transparently executed with any of the supported container runtimes, depending on which runtimes are available in the target compute environment.

  • Related