I have a folder with a mixture of files types (.bam, .bam.bai, and .log). I created a for loop to perform two commands on each of the .bam files. My current code direct the output of each command into a separate csv files, because I could not figure out how to direct the outputs to separate columns.
TYIA!
Question 1
I want to export the output from the commands into the same csv. How can I alter my code so that the output from my first command is saved as the first column of a csv, and the output from my second command is saved as the second column of the same csv.
Question 2
What is the name of the syntax used to select files in a for loop? For instance, the * in *.bam represents a wildcard. Is this regex? I had a tough time trying to alter this so that only *.bam files were selected for the for loop (and .bam.bai were excluded). I ended up with *[.bam] by guessing and empirically testing my outputs. Are there any websites that do a good job of explaining this syntax and provide lots of examples (coder level: newbie)
Current Code
> ~/Desktop/Sample_Names.csv
> ~/Desktop/Read_Counts.csv
echo "Sample" | cat - > ~/Desktop/Sample_Names.csv
echo "Total_Reads" | cat - > ~/Desktop/Read_Counts.csv
for file in *[.bam]
do
samtools view -c $file >> ~/Desktop/Read_Counts.csv
gawk -v RS="^$" '{print FILENAME}' $file >> ~/Desktop/Sample_Names.csv
done
Current Outputs (truncated)
>Sample_Names.csv
| Sample |
|--------------|
| B40-JV01.bam |
| B40-JV02.bam |
| B40-JV03.bam |
>Read_Counts.csv
| Total_Reads |
|-------------|
| 3835555 |
| 4110463 |
| 144558 |
Desired Output
>Combined_Outputs.csv
| Sample | Total_Reads |
|--------------|-------------|
| B40-JV01.bam | 3835555 |
| B40-JV02.bam | 4110463 |
| B40-JV03.bam | 144558 |
CodePudding user response:
Something like
echo "Sample,Total_Reads" > Combined_Outputs.csv
for file in *.bam; do
printf "%s,%s\n" "$file" "$(samtools view -c "$file")"
done >> Combined_Outputs.csv
Print one line for each file, and move the output redirection outside of the loop for efficiency.