I'm trying to write a for loop that unzips fastq.gz files that contain R1 in the file name, determines # of lines in each file, and divides # of lines by 4. Ideally I could also write this into a txt file with two columns (file name and # of lines/4).
This loop unzips R1 fastq files and deterimnes # of lines in each file but does not divide by 4 (or save output into a txt file).
for i in $(ls ./R1); do gzcat ./$i | wc -l done;
Other posts on here suggest using bc to divide in bash, but I haven't been able to integrate this into a loop.
CodePudding user response:
You never use for i in $(ls anything)
, see Bash Pitfalls #1. Your loop will fail for filenames with spaces or any other special characters. For most circumstances, you simply iterate over the files with for i in path/*; do ...
, but understand that can fail if the filenames contain the '\n'
character as part of the name. The optimal for handling all filenames is to use find
as while read -r name; do ... done < <(find path -type f -name "*.gz")
(note process substitution, < <(...)
is a bash only construct, pipe to the loop if using POSIX shell)
Next, to write the name and number of lines / 4 to a new file, wrap your entire loop in a new scope between { .... }
and simply redirect all output at once to the new file.
You should also add validations to check if the file is a directory ending in gz
and skip any found, as well as skipping any empty file (zero file size)
If you it altogether, you could do something like:
{
for i in R1/*.gz; do
[ -d "$i" ] && continue ## skip any directories
[ -s "$1" ] && continue ## skip empty files
nlines=$(gzcat "$i" | wc -l) ## get number of lines
printf "%s\t%s\n" "$i" $((nlines / 4)) ## output name, nlines / 4
done
} > newfile ## redirect all output to newfile
(output is written with a tab
character "\t"
separating the name and number / 4 -- adjust as desired)
Look things over and let me know if you have any questions.
CodePudding user response:
This would work, if you allow that 5 / 4 = 1 (so rounded down to the nearest integer). If you want to work with decimals (5 / 4 = 1.25) then you'll need bc
or awk
for i in $(ls ./R1); do
nb_lines=$(gzcat ./$i | wc -l)
echo $((nb_lines / 4))
done;
CodePudding user response:
The simpliest way to do integer arithmetic is using the $((...))
notation, as you can see from these simple examples:
Prompt> echo $((2*6))
12
Prompt> echo $((20/4))
5
Prompt> echo $((21/4))
5
It can also be used in combination with other commands, like wc -l
:
Prompt> cat .viminfo | wc -l
287
Prompt> echo $(($(cat .viminfo | wc -l) / 4))
71