I have a directory full of paired input files (80 samples, so 160 files in total). An example of a paired input is:
G49Am24_1_100_a100_1.fq.gz
G49Am24_1_100_a100_2.fq.gz
All input pairs will have _1.fq.gz and _2.fq.gz at the end.
I'm using trimgalore, which is a tool for cleaning genetic data. When I run the code to clean a pair of files from within the directory, it works perfectly:
trim_galore --length 40 --quality 25 --paired ./G49Am24_1_100_a100_1.fq.gz ./G49Am24_1_100_a100_2.fq.gz
I'd like to run a loop that will clean all of the pairs of files. This is my first go at writing a loop, and I came up with:
for infile in *_1.fq.gz ; do
base=$(basename ${infile} _1.fq.gz) > trim_galore --length 40 --quality 25 --paired ${infile} ${base}_2.fq.gz
done
From the code above, I get the error message '--length: command not found'
(multiple times).
Any ideas?
CodePudding user response:
Your syntax is incorrect. >
is for redirection. What you're doing right now is setting a variable to base
, creating an empty file called trim_galore
, and then running a nonexistent command --length
.
for infile in *_1.fq.gz; do
base=$(basename "$infile" _1.fq.gz)
second="${base}_2.fq.gz"
trim_galore --length 40 --quality 25 --paired "$infile" "$second"
done
You could also use string substitution instead of basename
:
for infile in *_1.fq.gz; do
trim_galore --length 40 --quality 25 --paired "${infile}" "${infile/1.fq.gz}2.fq.gz"
done