How to count the occurence of negative and positive values in a column using awk?-CodePudding

I have a file that looks like this:

FID IID data1 data2 data3 
1   RQ00001-2   1.670339    -0.792363849    -0.634434791    
2   RQ00002-0   -0.238737767    -1.036163943    -0.423512414
3   RQ00004-9   -0.363886913    -0.98661685 -0.259951265
3   RQ00004-9   -9  -0.98661685 0.259951265

I want to count the number of positive numbers in column 3 (data 1) versus negative numbers excluding -9. Therefore, for column 3 it will be 1 positive vs 2 negative. I didn't include -9 as this stands for missing data. For data2, this would be 3 negative versus 1 positive. For the last column it will be 3 negative versus 1 positive.

I preferably would like to use awk, but since I am new I need help. I use the command below but this just counts all the - values but I need it to exclude -9. Is there a more sophisticated way of doing this?

awk '$3 ~ /^-/{cnt  } END{print cnt}' filename.txt

CodePudding user response：

You can use this awk solution:

awk -v c=3 '
NR > 1 && $c != -9 {
   if ($c < 0)
        neg
   else
        pos
}
END {
   printf "Positive: %d, Negative: %d\n", pos, neg
}' file

Positive: 1, Negative: 2

Running it with c=5:

awk -v c=5 'NR > 1 && $c != -9 {if ($c < 0)   neg; else   pos} END {printf "Positive: %d, Negative: %d\n", pos, neg}' file

Positive: 1, Negative: 3

CodePudding user response：

Assumptions:

determine the number of negative and positive values for the 3rd thru Nth columns

One awk idea:

awk '
NR>1  { for (i=3;i<=NF;i  ) {
                 if ($i == -9) continue
            else if ($i <   0) neg[i]  
            else if ($i >   0) pos[i]  
        }
      }
END   { printf "Neg/Pos"
        for (i=3;i<=NF;i  )
            printf "%s%s/%s",OFS,neg[i] 0,pos[i] 0
        print ""
      }
' filename.txt

This generates:

Neg/Pos 2/1 4/0 3/1

NOTE: OP hasn't provided an example of the expected output; all of the counts are located in the arrays so modifying the output format should be relatively easy once OP has provided a sample output

CodePudding user response：

$ awk '
NR == 1 {
  for(i = 3; i <= NF; i  ) header[i] = $i
}
NR > 1 {
  for(i = 3; i <= NF; i  ) {
    pos[i]  = ($i >= 0); neg[i]  = (($i != -9) && ($i < 0))
  }
}
END {
  for(i in pos) {
    if (header[i] == "") header[i] = "column " i
    printf("%-10s: %d positive, %d negative\n", header[i], pos[i], neg[i])
  }
}' file
data1     : 1 positive, 2 negative
data2     : 0 positive, 4 negative
data3     : 1 positive, 3 negative

CodePudding user response：

awk '
NR > 1 && $3 != -9 {$3 >= 0 ?   p :   n}
END {print "pos: "p 0, "neg: "n 0}'

Gives:

pos: 1 neg: 2

You can change n to --p to get a single number p, equal to number of positive minus number of negative.

CodePudding user response：

Below you find some examples how you can achieve this:

Note: we assume that -0.0 and 0.0 are positive.

Count negative numbers in column n:

$ awk '(FNR>1){c =($n<0)}END{print "pos:",(NR-1-c),"neg:"c 0}' file

Count negative numbers in column n, but ignore -9:

$ awk '(FNR>1){c =($n<0);d =($n==-9)}END{print "pos:",(NR-1-c-2*d),"neg:"c-d}' file

Count negative numbers columns m to n:

$ awk '(FNR>1){for(i=m;i<=n;  i) c[i] =($i<0)}
       END{for(i=m;i<=n;  i) print i,"pos:",(NR-1-c[i]),"neg:"c[i] 0}' file

Count negative numbers in columns m to n, but ignore -9:

$ awk '(FNR>1){for(i=m;i<=n;  i) {c =($i<0);d =($i==-9)}}
       END{for(i=m;i<=n;  i) print i,"pos:",(NR-1-c[i]-2*d[i]),"neg:"c[i]-d[i]}' file