How do I convert BAM to FASTQ in pipe to analyze and then remove the fastq generated?-CodePudding

Sorry for this question but I am a bioinformatician with not much programming experience and I don't know how to fix it.

I have several bam files in a server directory. I would like to count with grep the number of occurrences both in the bam and in the corrisponding fastq file but, due to the lack of space, I also need to remove each fastq file after the analysis.

To count the number of occurrences in the BAM I tried with this (but any suggestions are welcome):

for i in *bam* ; do
    echo $i >> TELO_BAM
    grep 'TTAGGG' $i |
    wc -l >> TELO_BAM 
done

As you can see I also wanted to put the file name and the count in a txt file.

Now, to count the number in a fastq I tried with this but I don't know how to delete the fastq as soon as I've done the count (so it doesn't take up space):

for i in *bam*; do
    samtools bam2fq $i >> $i\.fq |
    echo $i >> TELO_FQ
    grep 'TTAGGG' $i >> TELO_FQ
done

I don't know if it works, but I would like to convert the bam to fastq, put the filename in a text file, count the occurrences in the fastq and then delete it and move on to the next file.

CodePudding user response：

grep XXX | wc -l gives you the number of lines that contain XXX, not the number of occurrences (unless each line can contain at most one occurrence).
With GNU grep -o XXX | wc -l you'll be able to count the total number of occurrences.
Using >> TELO_XXX inside the loops is sub-optimal, it's better to use a file descriptor.
~~You name the FastQ file $i.fq, so you just have to rm "$i.fq" after using it.~~ No need to create any FastQ file; samtools bam2fq writes its output to stdout so you can grep it directly`.

Here's an untested code illustrating all that:

exec 3> TELO_BAM || exit 1
exec 4> TELO_FQ  || exit 1

for bam in *bam*
do
    printf '%s: ' "$bam" 1>&3
    grep -o 'TTAGGG' "$bam" | wc -l 1>&3

    printf '%s: ' "$bam.fq" 1>&4
    samtools bam2fq "$bam" | grep -o 'TTAGGG' | wc -l 1>&4
done