Home > database >  Loop over values in multiple colums with condition and print first column value if true in AWK
Loop over values in multiple colums with condition and print first column value if true in AWK

Time:06-29

My data is as follows (this is just a sample, real data has ~20,000 lines) :

Original raw data (tsv):

Names   USA EU  FR
Jim     3   12  5
John    8   4   7
Jane    12  35  3
Sue     6   3   9

Image of original Data:

Orig data

I want to loop over each col starting with col 2 and if values are greater than 5, then print the column 1 value of that line. The resulting data should be as follows (tab separated):

Resulting Raw Data (tsv):

USA   EU   FR
John  Jim  John
Jane  Jane Sue
Sue 

Image of resulting data:

Resulting data

I have tried the following:

    awk -F"\t" '{ 
            for(i=2; i<=NF; i  ) {
                if($i > 5){
                    print $1 
                } 
            }
        }' file > results

But it prints all the results in a single column and doesnt grab the headers of the other cols. I have also tried to capture the header during each loop step with print FNR == i {print $1} "\n" but I get a syntax error related to the index:

    awk: cmd. line:4:                 print FNR == i {print $1} "\n"
    awk: cmd. line:4:                                ^ syntax error
    awk: cmd. line:8:     }
    awk: cmd. line:8:     ^ syntax error

CodePudding user response:

The expected output you provided doesn't show what you describe as your requirements so maybe this is what you really want:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR > 1 {
    for ( i=2; i<=NF; i   ) {
        $i = ( $i > 5 ? $1 : "" )
    }
}
{
    $1 = ""
    sub(OFS,"")
    print
}

$ awk -f tst.awk file
USA     EU      FR
        Jim
John            John
Jane    Jane
Sue             Sue

CodePudding user response:

One GNU awk (for multidimensional arrays) idea:

awk '
BEGIN { FS=OFS="\t" }
NR==1 { for (col=2;col<=NF;col  )
            output=output (col==2 ? "" : OFS) $col
        print output
        next
      }
      { for (col=2;col<=NF;col  )
            if ($col > 5)
               names[  count[col]][col]=$1
      }
END   { for (col=2;col<=NF;col  )  
            maxlines=(count[col]>maxlines ? count[col] : maxlines)
        for (lineno=1;lineno<=maxlines;lineno  ) {
            output=""
            for (col=2;col<=NF;col  )
                output=output (col==2 ? "" : OFS) names[lineno][col]
            print output
        }
      }
' file.tsv

This generates:

USA     EU      FR
John    Jim     John
Jane    Jane    Sue
Sue
  • Related