I am trying to write a for loop where I conditionally parse specific values from a csv file into the do command.
My situation is as follows: I have several directories containing genome sequences. The samples are numbered and the directories are named accordingly.
Dir 1 contains sample1_genome.fasta
Dir 2 contains sample2_genome.fasta
Dir 3 contains sample3_genome.fasta
The genome sequences have differing average read lengths. It is important to adress this. Therefore, I created a csv file containing the sample number and the according average read length of the genome sequence. csv file example (first column = sample_no, 2nd column = avg_read_length):
1,130
2,134
3,129
Now, I want to loop through the directories, take the genome sequences as input and parse the respective average read length to the process.
my code is as follows:
for f in *
do
shortbred_quantify.py --genome $f/sample${f%}.fasta --aerage_read_length *THE SAMPLE MATCHING VALUE FROM 2nd COLUMN* --results results/quantify_results_sample${f%}
done
Can you help me out with this?
CodePudding user response:
Use awk
. $2
is the second field, $1
is the first. eg:
$ cat input
1,130
2,134
3,129
$ awk '$2 == avgReadBP{ print $1 }' FS=, avgReadBP=134 input
2
So your command ends up looking like:
input="$f"/genome_sample.fasta
shortbred_quantify.py --genome "$input" \
--avgreadBP "$(awk '$2 == a{ print $1 }' FS=, a="$value_to_match" "$input")" \
--results results/quantify_results_sample"${f}"
Don't forget to quote the filename.
CodePudding user response:
I would structure it along these lines:
while IFS=, read sample read_length
do
shortbred_quantify.py --genome "$sample/genome_sample.fasta" --avgreadBP "$read_length" --results "results/quantify_results_sample$sample"
done < your.csv