Home > Enterprise >  Why is this bash loop failing to concatenate the files?
Why is this bash loop failing to concatenate the files?

Time:10-20

I am at my wits end as to why this loop is failing to concatenate the files the way I need it. Basically, lets say we have following files:

AB124661.lane3.R1.fastq.gz
AB124661.lane4.R1.fastq.gz

AB124661.lane3.R2.fastq.gz
AB124661.lane4.R2.fastq.gz

What we want is:

cat AB124661.lane3.R1.fastq.gz AB124661.lane4.R1.fastq.gz > AB124661.R1.fastq.gz
cat AB124661.lane3.R2.fastq.gz AB124661.lane4.R2.fastq.gz > AB124661.R2.fastq.gz

What I tried (and didn't work):

  1. Create and save file names (AB124661) to a ID file:

ls -1 R1.gz | awk -F '.' '{print $1}' | sort | uniq > ID

This creates an ID file that stores the samples/files name.

  1. Run the following loop:
for i in `cat ./ID`; do cat $i\.lane3.R1.fastq.gz $i\.lane4.R1.fastq.gz \> out/$i\.R1.fastq.gz; done
    
for i in `cat ./ID`; do cat $i\.lane3.R2.fastq.gz $i\.lane4.R2.fastq.gz \> out/$i\.R2.fastq.gz; done

The loop fails and concatenates into empty files.

Things I tried:

  • Yes, the ID file is definitely in the folder
  • When I run with echo it shows the cat command correct

Any help will be very much appreciated,

Best,

AC

CodePudding user response:

  1. why are you escaping the \> ? That's going to result in a cat: '>': No such file or directory instead of a redirection.
  2. Don't read lines with for
while IFS= read -r id; do
    cat "${id}.lane3.R1.fastq.gz" "${id}.lane4.R1.fastq.gz" > "out/${id}.R1.fastq.gz"
    cat "${id}.lane3.R2.fastq.gz" "${id}.lane4.R2.fastq.gz" > "out/${id}.R2.fastq.gz"
done < ./ID

CodePudding user response:

Let say you have id stored in file ./ID per line

while read -r line; do
    cat "$line".lane3.R1.fastq.gz "$line".lane4.R1.fastq.gz > "$line".R1.fastq.gz
    cat "$line".lane3.R2.fastq.gz "$line".lane4.R2.fastq.gz > "$line".R2.fastq.gz
done < ./ID 

CodePudding user response:

A pure shell solution could be like that:

for file in *.fastq.gz; do
    id=${file%%.*}
    [ -e "$id".R1.fastq.gz ] || cat "$id".*.R1.fastq.gz > "$id".R1.fastq.gz
    [ -e "$id".R2.fastq.gz ] || cat "$id".*.R2.fastq.gz > "$id".R2.fastq.gz
done

Alternatively:

printf '%s\n' *.fastq.gz | cut -d. -f1 | sort -u |
while IFS= read -r id; do
    cat "$id".*.R1.fastq.gz > "$id".R1.fastq.gz
    cat "$id".*.R2.fastq.gz > "$id".R2.fastq.gz
done

This solution assumes filenames of interest don't contain newline characters.

  • Related