Home > Mobile >  how to automate where a column name is a parameter
how to automate where a column name is a parameter

Time:07-27

I need to automate a command line software (pvacseq tools) where I need the 12th column of the row that contains "CHROM" of each file to be a parameter, this is what I have thought:

for i in *.vcf  ; do  pvacseq run \ 
$i \ 
awk'/CHROM/{print $12}' ${i}\ 
HLA-A*02:01,HLA-B*35:01,DRB1*11:01 \
MHCflurry MHCnuggetsI MHCnuggetsII NNalign NetMHC PickPocket SMM SMMPMBEC SMMalign \ 
"${i%%_vep*}_pvac-result"   ; done

The problem seems that with each space of the awk command, pvacseq unserstands that a new option It's been introduced.

So I guess what I need is a way of automating the extraction of that column in a single command without spaces or a way for the program to understand that the awk command is a single command even thought if It has spaces.

About the replication of the problem I don't know how to approach It since installing pvaseq can be complicated.

CodePudding user response:

Assumptions/Understandings:

  • the current code has pvacseq trying to parse the literal strings - $(awk '/CHROM/{print $12 - as arguments when what OP wants is for the results of the awk call to be fed into the pvacseq call
  • each awk call retrieves exactly one item from file $i (OP should confirm if it's possible no matches, or multiple matches, can be found and if so how to proceed)
  • the result of the awk call is to be used to build the pvacseq call
  • we don't have to worry about escaping any characters in the awk result that could lead pvacseq to misread the input

A couple ideas:

Option #1

Before building/executing the pvacseq call run the awk code separately and store the result in a variable, then pass the variable to pvacseq:

for i in *.vcf
do
    x=$(awk '/CHROM/{print $12}' ${i})
    pvacseq run $i $x HLA-A*02:01,HLA-B*35:01,DRB1*11:01 ... 
    #              ^^
done

Option #2:

Leave the awk call where it is but wrap it in a subprocess call; bash will run the awk subprocess first and then feed the result into the pvacseq call (ie, pvacseq will not try to parse the literal strings $(awk '/CHROM/{print $12 ...):

for i in *.vcf
do
    pvacseq run $i $(awk '/CHROM/{print $12}' ${i}) HLA-A*02:01,HLA-B*35:01,DRB1*11:01 ... 
    #              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
done

CodePudding user response:

Perhaps best is to pass the column number as variable. Say you have the column number stored in a bash variable column_number, you could do a

awk -v colno="$column_number" '/CHROM/{print $colno}' ${i} ....
  • Related