I have 96 bam files, How do I output the txt file with the unique sample IDs? I am looping through the bam files, but need to assign unique output files. For example: SC845414.txt
#Typical Bam Files:
SC845414-CTGATCGT-GCGCATAT_Aligned.sortedByCoord.out.bam
SC845425-TGTGACTG-AGCCTATC_Aligned.sortedByCoord.out.bam
#!/bin/bash
#SBATCH --mem=110g
#SBATCH --cpus-per-task=12
#SBATCH --time=10-00:00:00
module load python
DIR=/PATH/*
for d in $DIR; do
python -m HTSeq.scripts.count -s yes -f bam "$d" /PATH1/gencode.v35.annotation.gtf > /PATH3/HTseq/SC845414.txt
done
CodePudding user response:
It depends highly on what exactly you mean by "sample ID".
Based on your example, if you mean "the part of the filename before the first dash", then you could do this:
for d in $DIR; do
id=$(basename "$d" | cut -f 1 -d -)
python -m HTSeq.scripts.count -s yes -f bam "$d" /PATH1/gencode.v35.annotation.gtf > "/PATH3/HTseq/$id.txt"
done
CodePudding user response:
same; but using builtin variable interpolation instead of calling basename and cut
for d in $DIR; do
fname=${d##*/}
python -m HTSeq.scripts.count -s yes -f bam "$d" /PATH1/gencode.v35.annotation.gtf > "/PATH3/HTseq/${fname%%-*}.txt"
done
(edited to strip any leading path as well)
unfortunately stripping both the leading and trailing parts of a variable at once is beyond me (at the moment).
seems it should be do-able see: https://www.thegeekstuff.com/2010/07/bash-string-manipulation/
(no affiliation or endorsement; just first relevant web search)