I'm trying to apply use a loop to use the function fastp on a nextflow block, but I'm not sure how to set up a loop with two variables. I would want to change --in1 and --in2 to be the forward and reverse read pair to get an outputted file for each.
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
workflow {
FASTP()
}
process FASTP {
script:
"""
fastp
--in1 ${baseDir}/sequences/sequences_split/SRR19573234_R1.fastq
--in2 ${baseDir}/sequences/sequences_split/SRR19573234_R2.fastq
--out1 ${baseDir}/sequences/sequences_split/sequences_trimmed/trimmed_SRR19573234_R1.fastq
--out2 ${baseDir}/sequences/sequences_split/sequences_trimmed/trimmed_SRR19573234_R2.fastq
--html ${baseDir}/results/trimmed_SRR19573234.fastp.html
fastp
--in1 ${baseDir}/sequences/sequences_split/SRR19573260_R1.fastq
--in2 ${baseDir}/sequences/sequences_split/SRR19573260_R2.fastq
--out1 ${baseDir}/sequences/sequences_split/sequences_trimmed/trimmed_SRR19573260_R1.fastq
--out2 ${baseDir}/sequences/sequences_split/sequences_trimmed/trimmed_SRR19573260_R2.fastq
--html ${baseDir}/results/trimmed_SRR19573260.fastp.html
"""
}
CodePudding user response:
create a text file tab delimited with 3 columns: ID, R1, R2 , read the file, split and use this Channel as an input for FASTP
nextflow.enable.dsl=2
params.fastqs = "NO_FILE"
workflow {
fch = Channel.fromPath(params.fastqs).splitCsv(header:false,sep:'\t')
FASTP(fch)
}
process FASTP {
input:
tuple val(id),val(R1),val(R2)
output:
tuple val(id),path("trimmed_${ID}_R1.fastq"),path("trimmed_${ID}_R2.fastq"),path("trimmed_${ID}.fastp.html")
script:
"""
fastp \
--in1 ${R1} \
--in2 ${R2} \
--out1 trimmed_${ID}_R1.fastq \
--out2 trimmed_${ID}_R2.fastq \
--html trimmed_${ID}.fastp.html
"""
}
CodePudding user response:
You can use the fromFilePairs
factory method to create a channel that emits the file pairs matching a glob pattern. These are returned as tuples where the first element is the group key and the second element is the list of files (sorted lexicographically). For example:
params.reads = './sequences/sequences_split/*_R{1,2}.fastq'
process fastp {
input:
tuple val(sample), path(reads)
output:
tuple val(sample), path("trimmed_${sample}_R{1,2}.fastq"), emit: trimmed
path("trimmed_${sample}.fastp.html"), emit: html
script:
def (r1, r2) = reads
"""
fastp \\
--in1 "${r1}" \\
--in2 "${r2}" \\
--out1 "trimmed_${sample}_R1.fastq" \\
--out2 "trimmed_${sample}_R2.fastq" \\
--html trimmed_${sample}.fastp.html
"""
}
workflow {
readgroups = Channel.fromFilePairs( params.reads )
fastp( readgroups )
// do something with the trimmed reads
fastp.out.trimmed.view()
}
Results:
$ nextflow run main.nf
N E X T F L O W ~ version 22.10.0
Launching `main.nf` [extravagant_mercator] DSL2 - revision: a2bffdf878
executor > local (2)
[47/bc3a2e] process > fastp (1) [100%] 2 of 2 ✔
[SRR19573234, [/path/to/work/87/11dac1260431a174200e2d0df35754/trimmed_SRR19573234_R1.fastq, /path/to/work/87/11dac1260431a174200e2d0df35754/trimmed_SRR19573234_R2.fastq]]
[SRR19573260, [/path/to/work/47/bc3a2e4c149ba9d7dcb909ec668747/trimmed_SRR19573260_R1.fastq, /path/to/work/47/bc3a2e4c149ba9d7dcb909ec668747/trimmed_SRR19573260_R2.fastq]]