Loop over values in multiple colums with condition and print first column value if true in AWK-CodePudding

My data is as follows (this is just a sample, real data has ~20,000 lines) :

Original raw data (tsv):

Names   USA EU  FR
Jim     3   12  5
John    8   4   7
Jane    12  35  3
Sue     6   3   9

Image of original Data:

I want to loop over each col starting with col 2 and if values are greater than 5, then print the column 1 value of that line. The resulting data should be as follows (tab separated):

Resulting Raw Data (tsv):

USA   EU   FR
John  Jim  John
Jane  Jane Sue
Sue

Image of resulting data:

I have tried the following:

    awk -F"\t" '{ 
            for(i=2; i<=NF; i  ) {
                if($i > 5){
                    print $1 
                } 
            }
        }' file > results

But it prints all the results in a single column and doesnt grab the headers of the other cols. I have also tried to capture the header during each loop step with print FNR == i {print $1} "\n" but I get a syntax error related to the index:

    awk: cmd. line:4:                 print FNR == i {print $1} "\n"
    awk: cmd. line:4:                                ^ syntax error
    awk: cmd. line:8:     }
    awk: cmd. line:8:     ^ syntax error

CodePudding user response：

The expected output you provided doesn't show what you describe as your requirements so maybe this is what you really want:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR > 1 {
    for ( i=2; i<=NF; i   ) {
        $i = ( $i > 5 ? $1 : "" )
    }
}
{
    $1 = ""
    sub(OFS,"")
    print
}

$ awk -f tst.awk file
USA     EU      FR
        Jim
John            John
Jane    Jane
Sue             Sue

CodePudding user response：

One GNU awk (for multidimensional arrays) idea:

awk '
BEGIN { FS=OFS="\t" }
NR==1 { for (col=2;col<=NF;col  )
            output=output (col==2 ? "" : OFS) $col
        print output
        next
      }
      { for (col=2;col<=NF;col  )
            if ($col > 5)
               names[  count[col]][col]=$1
      }
END   { for (col=2;col<=NF;col  )  
            maxlines=(count[col]>maxlines ? count[col] : maxlines)
        for (lineno=1;lineno<=maxlines;lineno  ) {
            output=""
            for (col=2;col<=NF;col  )
                output=output (col==2 ? "" : OFS) names[lineno][col]
            print output
        }
      }
' file.tsv

This generates:

USA     EU      FR
John    Jim     John
Jane    Jane    Sue
Sue