How to use the value of stdin as the value of a variable that is to be used in an AWK program?-CodePudding

I want to take the output of one tool, e.g.,

echo US59606

and use it as the value of an awk variable. The variable and its newly assigned value is then used in an awk program.

I (incorrectly) thought that this is the way to do it:

echo US59606 | awk -v arpt_ident=$1 -f anav_records_for_arpt_ident.awk ANAV.TXT

For debugging purposes, in the awk program I printed the value of the variable arpt_ident. The output I got is the empty string. Bummer.

Question: How to run awk with a variable (or variables) whose value(s) come from stdin? Here's the general structure of the pipeline that I want awk to be a part of:

some_cmd | awk -v variable1=??? variable2=??? -f my_program.awk my_file.txt

where some_cmd produces two values, the first value is to be assigned to variable1, the second value is to be assigned to variable2. How to assign variable1 and variable2 the correct value from stdin (the output of some_cmd)?

CodePudding user response：

One idea would be to treat stdin as the 1st file (FNR==NR) and have awk save the input values either in a pre-defined set of variables or as entries in an array.

A couple variations on using an array based on stdin format ... a) multiple values on a line vs b) each value on a separate line:

# values as space-delimited strings on a single line of input:

echo val1 val2 val3 |
awk '
FNR==NR { for (i=1;i<=NF;i  )
              var[i] = $i
          for (i=1;i<=NF;i  )
              printf "input: var[%d] = %s\n", i, var[i]
          next }
        { print "do something with 2nd input file" }
' - somefile

# values on separate input lines

printf "val1\nval2\nval3\n" |
awk '
FNR==NR { var[  c]=$0
          next
        }
FNR==1  { for (i=1;i<=c;i  )
              printf "input: var[%d] = %s\n", i, var[i]
        }
        { print "do something with 2nd input file" }
' - somefile

Where:

- inputfile says to take stdin (-) as the 1st input file and somefile as the 2nd input file
by processing stdin as a 'file' we eliminate the need for command line -v var=val clauses
the awk script would then reference the array entries instead of the variables (eg, replace arpt_ident with var[1])

Both of these generate:

input: var[1] = val1
input: var[2] = val2
input: var[3] = val3
do something with 2nd input file
do something with 2nd input file
do something with 2nd input file
do something with 2nd input file
... snip ...

Of if individual variable names are required:

echo val1 val2 val3 |
awk '
FNR==NR { arpt_ident=$1
          id=$2
          var3=$3
        }
... snip ...
' - somefile

printf "val1\nval2\nval3\n" |
awk '
FNR==NR { if (FNR==1) arpt_ident=$1          # alternatively look at using "switch" (aka awk case statement)
          if (FNR==2) id=$2
          if (FNR==3) var3=$3
        }
... snip ...
' - somefile

CodePudding user response：

For a file like bellow

cat file1
1 line0
2 line1
3 line2
4 line3

Something like this will use the stdin to filter the file:

echo "line2" |awk -v awkvar="$(</dev/stdin)" '$2 ~ awkvar {print}' file1
3 line2

CodePudding user response：

If some_cmd would produce only a single "value" (i.e. you want to pass the whole stdout of the command to a variable in awk), you can do a

awk -v var1="$(some_cmd)" ....

If it produces two values,as in your question, you must first make up your mind, how these values are stored in the standard output. Let's assume that there are two words in the bash sense of word splitting. In this case, it is easiest to use an auxiliary bash array:

cmd_output=( $(some_cmd) )
awk -v var1=${cmd_output[0]} -v var2=${cmd_output[1]} ....

If the output of your command is more complex, you need to parse it somehow. For instance to extract the first two integers from the output, you could do a

if [[ $(some_cmd) =~ (-?[[:digit:]] ).*(-?[[:digit:]] ) ]]
then
  awk -v var1="${BASH_REMATCH[0]}" -v var2="${BASH_REMATCH[1]}" ...

Depending on your parsing requirements, you may find it easier to do the parsing completely inside awk. In this case, you would pass the whole standard output to a single awk variable and inside the BEGIN block of your awk program calculate the variables which are needed later.