Home > Software design >  Combining loop with awk
Combining loop with awk

Time:09-05

I need help combining an awk with a loop.

I have two files, one Bedfile.bed and a Samplelist.txt that look like this:

Bedfile.bed

HiC_scaffold_2  1       50001

HiC_scaffold_2  400001  450001

HiC_scaffold_2  800001  850001

Samplelist.txt

sampleA
sampleB
sampleC

I would like to create a new Bedfile for each sample (from the Samplelist.txt) in which I include the sample name as a new column next to each line, and I add the name in the output. Looking like this, e.g., for the first two sample

Bedfile_SampleA.bed

HiC_scaffold_2  1       50001 SampleA

HiC_scaffold_2  400001  450001 SampleA

HiC_scaffold_2  800001  850001 SampleA

Bedfile_SampleB.bed

HiC_scaffold_2  1       50001 SampleB

HiC_scaffold_2  400001  450001 SampleB

HiC_scaffold_2  800001  850001 SampleB

I have done this for one file but I have more than a hundred files, so I would like to do some sort of loop using a sample list.

awk ' {print $1"\t"$2"\t"$3"\t""SampleA"}' Bedfile.bed >  Bedfile_SampleA.bed

Any suggestion?

CodePudding user response:

You can do the operation and the loop all in AWK, but if you wanted to do the loop 'separately' for another reason, you could use:

while read -r sample
do
     awk -v var="$sample" 'BEGIN{OFS="\t"} {print $0, var}' bedfile.bed > bedfile_"$sample".bed
done < samplelist.txt

CodePudding user response:

Thus is very straightforward in awk. First you read the sample file in memory, and then you process the full bed-file

awk 'BEGIN{OFS="\t"}(FNR==NR){a[$0]; next}{for(i in a){f=FILENAME"."i; print $0,I}}' sample.txt bed.txt
  • Related