Home > database >  Bash: For Loop & save each output as a new column in a csv
Bash: For Loop & save each output as a new column in a csv

Time:06-17

I have a folder with a mixture of files types (.bam, .bam.bai, and .log). I created a for loop to perform two commands on each of the .bam files. My current code direct the output of each command into a separate csv files, because I could not figure out how to direct the outputs to separate columns.

TYIA!

Question 1
I want to export the output from the commands into the same csv. How can I alter my code so that the output from my first command is saved as the first column of a csv, and the output from my second command is saved as the second column of the same csv.

Question 2
What is the name of the syntax used to select files in a for loop? For instance, the * in *.bam represents a wildcard. Is this regex? I had a tough time trying to alter this so that only *.bam files were selected for the for loop (and .bam.bai were excluded). I ended up with *[.bam] by guessing and empirically testing my outputs. Are there any websites that do a good job of explaining this syntax and provide lots of examples (coder level: newbie)

Current Code

> ~/Desktop/Sample_Names.csv
> ~/Desktop/Read_Counts.csv

echo "Sample" | cat - > ~/Desktop/Sample_Names.csv
echo "Total_Reads" | cat - > ~/Desktop/Read_Counts.csv

for file in *[.bam]
do
  samtools view -c $file >> ~/Desktop/Read_Counts.csv
  gawk -v RS="^$" '{print FILENAME}' $file >> ~/Desktop/Sample_Names.csv
done

Current Outputs (truncated)

>Sample_Names.csv
| Sample       |
|--------------|
| B40-JV01.bam |
| B40-JV02.bam |
| B40-JV03.bam |

>Read_Counts.csv
| Total_Reads |
|-------------|
| 3835555     |
| 4110463     |
| 144558      |

Desired Output

>Combined_Outputs.csv
| Sample       | Total_Reads |
|--------------|-------------|
| B40-JV01.bam | 3835555     |
| B40-JV02.bam | 4110463     |
| B40-JV03.bam | 144558      |

CodePudding user response:

Something like

echo "Sample,Total_Reads" > Combined_Outputs.csv
for file in *.bam; do
    printf "%s,%s\n" "$file" "$(samtools view -c "$file")"
done >> Combined_Outputs.csv

Print one line for each file, and move the output redirection outside of the loop for efficiency.

  • Related