Home > Blockchain >  Use an array created using awk as a variable in another awk script
Use an array created using awk as a variable in another awk script

Time:10-14

I am trying to use awk to extract data using a conditional statement containing an array created using another awk script.

The awk script I use for creating the array is as follows:

array=($(awk 'NR>1 { print $1 }' < file.tsv))

Then, to use this array in the other awk script

awk var="${array[@]}"  'FNR==1{ for(i=1;i<=NF;i  ){ heading[i]=$i } next } { for(i=2;i<=NF;i  ){ if($i=="1" && heading[i] in var){ close(outFile); outFile=heading[i]".txt"; print ">kmer"NR-1"\n"$1 >> (outFile) }}}' < input.txt

However, when I run this, the following error occurs.

awk: fatal: cannot open file 'foo' for reading (No such file or directory)

I've already looked at multiple posts on why this error occurs and on how to correctly implement a shell variable in awk, but none of these have worked so far. However, when removing the shell variable and running the script it does work.

awk 'FNR==1{ for(i=1;i<=NF;i  ){ heading[i]=$i } next } { for(i=2;i<=NF;i  ){ if($i=="1"){ close(outFile); outFile=heading[i]".txt"; print ">kmer"NR-1"\n"$1 >> (outFile) }}}' < input.txt

I really need that conditional statement but don't know what I am doing wrong with implementing the bash variable in awk and would appreciate some help.

Thx in advance.

CodePudding user response:

Just read both files with a single awk

awk '
    FNR == NR && FNR > 1 {
        var[$1]
        next
    }
    FNR == 1 {
        for (i = 1; i <= NF; i  )
            heading[i] = $i
        next
    }
    {
        for (i = 2; i <= NF; i  )
            if ( $i == "1" && heading[i] in var) {
                close(outFile)
                outFile = heading[i] ".txt"
                print ">kmer" (NR-1) "\n" $1 >> (outFile)
            }
    }
' file.tsv input.txt

CodePudding user response:

You might store string in variable, then use split function to turn, consider following simple example, let file1.txt content be

A B C
D E F
G H I

and file2.txt content be

1
3
2

then

var1=$(awk '{print $1}' file1.txt)
awk -v var1="$var1" 'BEGIN{split(var1,arr)}{print "First column value in line number",$1,"is",arr[$1]}' file2.txt

gives output

First column value in line number 1 is A
First column value in line number 3 is G
First column value in line number 2 is D

Explanation: I store output of 1st awk command, which is then used as 1st argument to split function in 2nd awk command. Disclaimer: this solutions all files involved have delimiter compliant with default GNU AWK behavior, i.e. one-or-more whitespaces is always delimiter.

(tested in gawk 4.2.1)

  • Related