I am using ksh on AIX.
I have a file with multiple comma delimited fields. The value of each field is read into a variable inside the script.
The last field in the file may contain multiple | delimited values. I need to test each value and keep the first one that doesn't begin with R, then stop testing the values.
sample value of $principal_diagnosis0 R65.20|A41.9|G30.9|F02.80
I've tried:
echo $principal_diagnosis0 | awk -F"|" '{for (i = 1; i<=NF; i ) {if ($i !~ "R"){echo $i; primdiag = $i}}}'
but I get this message : awk: Field $i is not correct.
My goal is to have a variable that I can use outside of the awk statement that gets assigned the first non-R code (in this case it would be A41.9).
echo $principal_diagnosis0 | awk -F"|" '{for (i = 1; i<=NF; i ) {if ($i !~ "R"){print $i}}}'
gets me the output of :
A41.9
G30.9
F02.80
So I know it's reading the values and evaluating properly. But I need to stop after the first match and be able to use that value outside of awk.
Thanks!
CodePudding user response:
you can make FS
and OFS
do all the hard work :
echo "${principal_diagnosis0}" |
mawk NF=NF FS='^(R[^|] [|]) |[|]. $' OFS=
A41.9
——————————————————————————————————————————
another slightly different variation of the same concept — overwriting fields but leaving OFS
as is :
gawk -F'^.*R[^|] [|]|[|]. $' '$--NF=$--NF'
A41.9
this works, because when you break it out :
gawk -F'^.*R[^|] [|]|[|]. $' '
{ print NF
} $(_=--NF)=$(__=--NF) { print _, __, NF, $0 }'
3
1 2 1 A41.9
you'll notice you start with NF = 3
, and the two subsequent decrements make it equivalent to $1 = $2
,
but since final NF
is now reduced to just 1, it would print it out correctly instead of 2 copies of it
…… which means you can also make it $0 = $2
, as such :
gawk -F'^.*R[^|] [|]|[|]. $' '$-_=$-—NF'
A41.9
——————————————————————————————————————————
a 3rd variation, this time using RS
instead of FS
:
mawk NR==2 RS='^.*R[^|] [|]|[|]. $'
A41.9
——————————————————————————————————————————
and if you REALLY don't wanna mess with FS/OFS/RS
, use gsub()
instead :
nawk 'gsub("^.*R[^|] [|]|[|]. $",_)'
A41.9
CodePudding user response:
To answer your specific question:
$ principal_diagnosis0='R65.20|A41.9|G30.9|F02.80'
$ foo=$(echo "$principal_diagnosis0" | awk -v RS='|' '/^[^R]/{sub(/\n/,""); print; exit}')
$ echo "$foo"
A41.9
The above will work with any awk, you can do it more briefly with GNU awk if you have it:
foo=$(echo "$principal_diagnosis0" | awk -v RS='[|\n]' '/^[^R]/{print; exit}')