I need to automate a command line software (pvacseq tools) where I need the 12th column of the row that contains "CHROM" of each file to be a parameter, this is what I have thought:
for i in *.vcf ; do pvacseq run \
$i \
awk'/CHROM/{print $12}' ${i}\
HLA-A*02:01,HLA-B*35:01,DRB1*11:01 \
MHCflurry MHCnuggetsI MHCnuggetsII NNalign NetMHC PickPocket SMM SMMPMBEC SMMalign \
"${i%%_vep*}_pvac-result" ; done
The problem seems that with each space of the awk
command, pvacseq unserstands that a new option It's been introduced.
So I guess what I need is a way of automating the extraction of that column in a single command without spaces or a way for the program to understand that the awk command is a single command even thought if It has spaces.
About the replication of the problem I don't know how to approach It since installing pvaseq can be complicated.
CodePudding user response:
Assumptions/Understandings:
- the current code has
pvacseq
trying to parse the literal strings -$(awk
'/CHROM/{print
$12
- as arguments when what OP wants is for the results of theawk
call to be fed into thepvacseq
call - each
awk
call retrieves exactly one item from file$i
(OP should confirm if it's possible no matches, or multiple matches, can be found and if so how to proceed) - the result of the
awk
call is to be used to build thepvacseq
call - we don't have to worry about escaping any characters in the
awk
result that could leadpvacseq
to misread the input
A couple ideas:
Option #1
Before building/executing the pvacseq
call run the awk
code separately and store the result in a variable, then pass the variable to pvacseq
:
for i in *.vcf
do
x=$(awk '/CHROM/{print $12}' ${i})
pvacseq run $i $x HLA-A*02:01,HLA-B*35:01,DRB1*11:01 ...
# ^^
done
Option #2:
Leave the awk
call where it is but wrap it in a subprocess call; bash
will run the awk
subprocess first and then feed the result into the pvacseq
call (ie, pvacseq
will not try to parse the literal strings $(awk
'/CHROM/{print
$12
...):
for i in *.vcf
do
pvacseq run $i $(awk '/CHROM/{print $12}' ${i}) HLA-A*02:01,HLA-B*35:01,DRB1*11:01 ...
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
done
CodePudding user response:
Perhaps best is to pass the column number as variable. Say you have the column number stored in a bash variable column_number
, you could do a
awk -v colno="$column_number" '/CHROM/{print $colno}' ${i} ....