Loop for two variables in nextflow block?-CodePudding

I'm trying to apply use a loop to use the function fastp on a nextflow block, but I'm not sure how to set up a loop with two variables. I would want to change --in1 and --in2 to be the forward and reverse read pair to get an outputted file for each.

#!/usr/bin/env nextflow

nextflow.enable.dsl=2

workflow {
FASTP()
}

process FASTP {
script:
    """
    fastp
--in1 ${baseDir}/sequences/sequences_split/SRR19573234_R1.fastq 
--in2 ${baseDir}/sequences/sequences_split/SRR19573234_R2.fastq 
--out1 ${baseDir}/sequences/sequences_split/sequences_trimmed/trimmed_SRR19573234_R1.fastq 
--out2 ${baseDir}/sequences/sequences_split/sequences_trimmed/trimmed_SRR19573234_R2.fastq 
--html ${baseDir}/results/trimmed_SRR19573234.fastp.html
    
    fastp 
--in1 ${baseDir}/sequences/sequences_split/SRR19573260_R1.fastq 
--in2 ${baseDir}/sequences/sequences_split/SRR19573260_R2.fastq 
--out1 ${baseDir}/sequences/sequences_split/sequences_trimmed/trimmed_SRR19573260_R1.fastq 
--out2 ${baseDir}/sequences/sequences_split/sequences_trimmed/trimmed_SRR19573260_R2.fastq 
--html ${baseDir}/results/trimmed_SRR19573260.fastp.html
    """
}

CodePudding user response：

create a text file tab delimited with 3 columns: ID, R1, R2 , read the file, split and use this Channel as an input for FASTP

nextflow.enable.dsl=2
params.fastqs = "NO_FILE"



workflow {
fch = Channel.fromPath(params.fastqs).splitCsv(header:false,sep:'\t')
FASTP(fch)
}

process FASTP {
input:
     tuple val(id),val(R1),val(R2)
output:
     tuple val(id),path("trimmed_${ID}_R1.fastq"),path("trimmed_${ID}_R2.fastq"),path("trimmed_${ID}.fastp.html")
script:
    """
    fastp \ 
--in1 ${R1} \ 
--in2 ${R2} \ 
--out1 trimmed_${ID}_R1.fastq \ 
--out2 trimmed_${ID}_R2.fastq \ 
--html trimmed_${ID}.fastp.html
    """
}

CodePudding user response：

You can use the fromFilePairs factory method to create a channel that emits the file pairs matching a glob pattern. These are returned as tuples where the first element is the group key and the second element is the list of files (sorted lexicographically). For example:

params.reads = './sequences/sequences_split/*_R{1,2}.fastq'

process fastp {

    input:
    tuple val(sample), path(reads)

    output:
    tuple val(sample), path("trimmed_${sample}_R{1,2}.fastq"), emit: trimmed
    path("trimmed_${sample}.fastp.html"), emit: html

    script:
    def (r1, r2) = reads

    """
    fastp \\
        --in1 "${r1}" \\
        --in2 "${r2}" \\
        --out1 "trimmed_${sample}_R1.fastq" \\
        --out2 "trimmed_${sample}_R2.fastq" \\
        --html trimmed_${sample}.fastp.html
    """
}

workflow {

    readgroups = Channel.fromFilePairs( params.reads )

    fastp( readgroups )

    // do something with the trimmed reads
    fastp.out.trimmed.view()
}

Results:

$ nextflow run main.nf 
N E X T F L O W  ~  version 22.10.0
Launching `main.nf` [extravagant_mercator] DSL2 - revision: a2bffdf878
executor >  local (2)
[47/bc3a2e] process > fastp (1) [100%] 2 of 2 ✔
[SRR19573234, [/path/to/work/87/11dac1260431a174200e2d0df35754/trimmed_SRR19573234_R1.fastq, /path/to/work/87/11dac1260431a174200e2d0df35754/trimmed_SRR19573234_R2.fastq]]
[SRR19573260, [/path/to/work/47/bc3a2e4c149ba9d7dcb909ec668747/trimmed_SRR19573260_R1.fastq, /path/to/work/47/bc3a2e4c149ba9d7dcb909ec668747/trimmed_SRR19573260_R2.fastq]]