I am trying to get the number of reads for my fastq files, and I wanted the output to also include the name of my files. I've found a solution online that almost works, but still not getting the right output. Example:
My file names:
12S_C-T1-045_F_filt.fastq.gz
12S_C-T1-PL_F_filt.fastq.gz
...
The code I have found:
for file in ./*.fastq.gz
do
file_name=$(basename -s .fastq $file)
printf "$file_name\t$(cat ${file} | wc -l)/4|bc\n" >> no_reads_12S.txt
done
The output:
12S_C-T1-045_F_filt.fastq.gz 114/4|bc
12S_C-T1-PL_F_filt.fastq.gz 26455/4|bc
...
So, clearly is not doing the calculation right--the numbers are not even correct. How should I fix this? I've tried also doing this:
for file in ./*.fastq.gz
do
file_name=$(basename -s .fastq.gz $file)
echo "$file_name"
echo $(zcat $file | wc -l)/4|bc
done
Which works, but then it gives me the filenames and read numbers in separate rows.
Thanks!
CodePudding user response:
Based on the 2nd script, would you please try:
#!/bin/bash
for file in ./*.fastq.gz; do
file_name=$(basename -s .fastq.gz "$file")
printf "%s\t%d\n" "$file_name" "$(echo $(zcat "$file" | wc -l) / 4 | bc)"
done
Or as a one-liner:
for file in ./*.fastq.gz; do file_name=$(basename -s .fastq.gz "$file"); printf "%s\t%d\n" "$file_name" "$(echo $(zcat "$file" | wc -l) / 4 | bc)"; done
As the synopsis of printf
is:
printf FORMAT [ARGUMENT]...
we need to feed strings as the arguments. the 1st argument "$file_name"
will be obvious. The second argument "$(echo $(zcat "$file" | wc -l) / 4 | bc)"
may require explanation. First the command $(zcat "$file" | wc -l)
is substituted with the line count as the output of the command pipeline.
Then the outer command will look like $(echo <number> / 4 | bc)
then it
is substituted with the result of bc
and passed to printf
.