How to parse specific values from csv file to a for loop command?-CodePudding

I am trying to write a for loop where I conditionally parse specific values from a csv file into the do command.

My situation is as follows: I have several directories containing genome sequences. The samples are numbered and the directories are named accordingly.

Dir 1 contains sample1_genome.fasta
Dir 2 contains sample2_genome.fasta
Dir 3 contains sample3_genome.fasta

The genome sequences have differing average read lengths. It is important to adress this. Therefore, I created a csv file containing the sample number and the according average read length of the genome sequence. csv file example (first column = sample_no, 2nd column = avg_read_length):

1,130
2,134
3,129

Now, I want to loop through the directories, take the genome sequences as input and parse the respective average read length to the process.

my code is as follows:

for f in *
do 
     shortbred_quantify.py --genome $f/sample${f%}.fasta --aerage_read_length *THE SAMPLE MATCHING VALUE FROM 2nd COLUMN* --results results/quantify_results_sample${f%}
done

Can you help me out with this?

CodePudding user response：

Use awk. $2 is the second field, $1 is the first. eg:

$ cat input
1,130
2,134
3,129
$ awk '$2 == avgReadBP{ print $1 }' FS=, avgReadBP=134 input
2

So your command ends up looking like:

input="$f"/genome_sample.fasta
shortbred_quantify.py --genome "$input" \
    --avgreadBP "$(awk '$2 == a{ print $1 }' FS=, a="$value_to_match" "$input")" \
    --results results/quantify_results_sample"${f}"

Don't forget to quote the filename.

CodePudding user response：

I would structure it along these lines:

while IFS=, read sample read_length
do
    shortbred_quantify.py --genome "$sample/genome_sample.fasta" --avgreadBP "$read_length" --results "results/quantify_results_sample$sample"
done < your.csv