I need help combining an awk with a loop.
I have two files, one Bedfile.bed
and a Samplelist.txt
that look like this:
Bedfile.bed
HiC_scaffold_2 1 50001
HiC_scaffold_2 400001 450001
HiC_scaffold_2 800001 850001
Samplelist.txt
sampleA
sampleB
sampleC
I would like to create a new Bedfile
for each sample (from the Samplelist.txt
) in which I include the sample name as a new column next to each line, and I add the name in the output. Looking like this, e.g., for the first two sample
Bedfile_SampleA.bed
HiC_scaffold_2 1 50001 SampleA
HiC_scaffold_2 400001 450001 SampleA
HiC_scaffold_2 800001 850001 SampleA
Bedfile_SampleB.bed
HiC_scaffold_2 1 50001 SampleB
HiC_scaffold_2 400001 450001 SampleB
HiC_scaffold_2 800001 850001 SampleB
I have done this for one file but I have more than a hundred files, so I would like to do some sort of loop using a sample list.
awk ' {print $1"\t"$2"\t"$3"\t""SampleA"}' Bedfile.bed > Bedfile_SampleA.bed
Any suggestion?
CodePudding user response:
You can do the operation and the loop all in AWK, but if you wanted to do the loop 'separately' for another reason, you could use:
while read -r sample
do
awk -v var="$sample" 'BEGIN{OFS="\t"} {print $0, var}' bedfile.bed > bedfile_"$sample".bed
done < samplelist.txt
CodePudding user response:
Thus is very straightforward in awk. First you read the sample file in memory, and then you process the full bed-file
awk 'BEGIN{OFS="\t"}(FNR==NR){a[$0]; next}{for(i in a){f=FILENAME"."i; print $0,I}}' sample.txt bed.txt