Home > other >  For Loop: Identify Filename Pairs, Input to For Loop
For Loop: Identify Filename Pairs, Input to For Loop

Time:02-02

I am attempting to adapt a previously answered question for use in a for loop.

I have a folder containing multiple paired file names that need to be provided sequentially as input to a for loop.

Example Input

WT1_0min-SRR9929263_1.fastq
WT1_0min-SRR9929263_2.fastq
WT1_20min-SRR9929265_1.fastq
WT1_20min-SRR9929265_2.fastq
WT3_20min-SRR12062597_1.fastq
WT3_20min-SRR12062597_2.fastq

Paired file names can be identified with the answer from the previous question:

find . -name '*_1.fastq' -exec basename {} '_1.fastq' \; | xargs -n1 -I{} echo {}_1.fastq {}_2.fastq

I now want adopt this for use in a for loop so that each output file can be independently piped to subsequent commands and also so that output file names can be appended.

Input files can be provided as a comma-separated list of files after the -1 and -2 flags respectively. So for this example, the bulk and undesired input would be:

-1 WT1_0min-SRR9929263_1.fastq,WT1_20min-SRR9929265_1.fastq,WT3_20min-SRR12062597_1.fastq

-2 WT1_0min-SRR9929263_2.fastq,WT1_20min-SRR9929265_2.fastq,WT3_20min-SRR12062597_2.fastq

However, I would like to run this as a for loop so that input files are provided sequentially:

Iteration #1
-1 WT1_0min-SRR9929263_1.fastq
-2 WT1_0min-SRR9929263_2.fastq


Iteration #2
-1 WT1_20min-SRR9929265_1.fastq
-2 WT1_20min-SRR9929265_2.fastq


Iteration #3
-1 WT3_20min-SRR12062597_1.fastq
-2 WT3_20min-SRR12062597_2.fastq

Below is an example of the for loop I would like to run using the xarg code to pull filenames. It currently does not work. I assume I need to somehow save the paired filenames from the xarg code as a variable that can be referenced in the for loop?

find . -name '*_1.fastq' -exec basename {} '_1.fastq' \; | xargs -n1 -I{} echo {}_1.fastq {}_2.fastq

for file in *.fastq
do
  bowtie2 -p 8 -x /path/genome \
  1- {}_1.fastq \
  2- {}_2.fastq \
  "../path/${file%%.fastq}_UnMappedReads.fastq.gz" \
  2> "../path/${file%%.fastq}_Bowtie2_log.txt" | samtools view -@ 7 -b | samtools sort -@ 7 -m 5G -o "../path/${file%%.fastq}_Mapped.bam"
done

The expected outputs for the example would be:

WT1_0min-SRR9929263_UnMappedReads.fastq.gz
WT1_20min-SRR9929265_UnMappedReads.fastq.gz
WT3_20min-SRR12062597_UnMappedReads.fastq.gz
WT1_0min-SRR9929263_Bowtie2_log.txt
WT1_20min-SRR9929265_Bowtie2_log.txt
WT3_20min-SRR12062597_Bowtie2_log.txt
WT1_0min-SRR9929263_Mapped.bam
WT1_20min-SRR9929265_Mapped.bam
WT3_20min-SRR12062597_Mapped.bam

CodePudding user response:

I don't know what "bowtie2" or "samtools" are but best I can tell all you need is:

#!/usr/bin/env bash

for file1 in *_1.fastq; do
    file2="${file1%_1.fastq}_2.fastq"
    echo "$file1" "$file2"
done

Replace echo with whatever you want to do with ta pair of files.

If you HAD to use find for some reason then it'd be:

#!/usr/bin/env bash

while IFS= read -r file1; do
    file2="${file1%_1.fastq}_2.fastq"
    echo "$file1" "$file2"
done < <(find . -type f -name '*_1.fastq' -print)

or if your file names can contain newlines then:

#!/usr/bin/env bash

while IFS= read -r -d $'\0' file1; do
    file2="${file1%_1.fastq}_2.fastq"
    echo "$file1" "$file2"
done < <(find . -type f -name '*_1.fastq' -print0)
  • Related