Count precent of lines that pass AWK filter?-CodePudding

I have a file (my_file) and want to count how many values in column 11 have value < .05:

I try:

echo $($(cat my_file | cut -f 11 | awk '$1 < 5E-2'  | wc -l) / $(cat my_file | cut -f 11 |  wc -l))

I get 1158532: command not found

Could anyone please help me see where I am wrong?

CodePudding user response：

Consider the string:

$(cat my_file | cut -f 11 | awk '$1 < 5E-2' | wc -l)

The $() construct is a "command substitution". The commands inside $() are executed and produce some output. That output is then executed as a command. If the pipelie produces the output "1158532", then bash will attempt to execute that string as a command. But there is no command 1158532 in your PATH, so you get the error message that you see. You really should just do this whole thing in awk with something like:

awk '$11 < 0.05 {c  } END {printf "%2.2f%%\n", 100.0 * c / NR}' my_file

To help understand why your command does not work, it might help to consider "fixing" it to be:

expr "$( cat my_file | cut -f 11 | awk '$1 < 5E-2'  | wc -l)" / "$(cat my_file | cut -f 11 |  wc -l)"

but notice that this will produce 0 or 1, since the arithmetic is not floating point, but is integers. You could get floating point values by running the data through bc with:

echo "$( cat my_file | cut -f 11 | awk '$1 < 5E-2'  | wc -l)" / "$(cat my_file | cut -f 11 |  wc -l)" | bc -l

Note that all of these UUOC should be removed (eg, with < my_file cut -f 11) and cut | awk is generally an anti-pattern. Just do the whole thing in awk.

CodePudding user response：

I think you might be able to handle this all via awk:

awk 'BEGIN {cnt=0} { if ($11<.05) cnt =1 } END {printf "%2.2f%%\n", cnt/NR*100}' my_file

CodePudding user response：

Using only awk:

awk '$11 < 0.05 {c  } END {print c}' my_file

CodePudding user response：

Here is an example of how to transform parts of your command into shorter equivalents:

cat my_file | cut -f 11 | wc -l
cat my_file | wc -l
wc -l < my_file

cat my_file | cut -f 11 | awk '$1 < 5E-2' | wc -l
cat my_file | awk -F'\t' '$11 < 5E-2' | wc -l
awk -F'\t' '$11 < 5E-2' my_file | wc -l
awk -F'\t' '$11 < 5E-2 {c  } END {print c}' my_file

To divide the two results:

awk -F'\t' '$11 < 5E-2 {c  } END {print c/NR}' my_file
0.666667

CodePudding user response：

Count precent of lines that pass AWK filter?

I would harness GNU AWK for this task following way, let file.txt content be

0.01
0.03
0.05
0.07
0.09

then

awk '{cnt =$1<0.05}END{print cnt/NR*100 "%"}' file.txt

gives output

40%

Explanation: comparison gives 0 or 1, so I use = which increase by 0 when condition not met and increase by 1 when condition holds. After all lines processed I compute percentage simply by dividing cnt by NR (which is inside END is number of all lines) and multiply by 100. Disclaimer: this solution assumes that file.txt has no less than 1 line.

(tested in gawk 4.2.1)

CodePudding user response：

{m,g}awk '
BEGIN {        ___=((_ =  _ _)*_ (\
           __=_*_ —-_) )^-!(_ -= _) 
} {  _ = ($__)<___ 
} END {

    printf("\n\n\tFilter hit rate :: %.*f %% ( %\47.f / %\47.f )"\
           " \n\n\t%*sFile :: %-.*s \n\n",
                    ___=__--,__*_*__/(__=NR),_,__,
             ___,____,_^=_*=_ =_^=_<_, FILENAME) } ' my_file

Filter hit rate :: 0.04415484271 % ( 3,588 / 8,125,949 )   

           File :: myfile