I have a file (my_file) and want to count how many values in column 11 have value < .05:
I try:
echo $($(cat my_file | cut -f 11 | awk '$1 < 5E-2' | wc -l) / $(cat my_file | cut -f 11 | wc -l))
I get 1158532: command not found
Could anyone please help me see where I am wrong?
CodePudding user response:
Consider the string:
$(cat my_file | cut -f 11 | awk '$1 < 5E-2' | wc -l)
The $()
construct is a "command substitution". The commands inside $()
are executed and produce some output. That output is then executed as a command. If the pipelie produces the output "1158532", then bash
will attempt to execute that string as a command. But there is no command 1158532
in your PATH, so you get the error message that you see. You really should just do this whole thing in awk
with something like:
awk '$11 < 0.05 {c } END {printf "%2.2f%%\n", 100.0 * c / NR}' my_file
To help understand why your command does not work, it might help to consider "fixing" it to be:
expr "$( cat my_file | cut -f 11 | awk '$1 < 5E-2' | wc -l)" / "$(cat my_file | cut -f 11 | wc -l)"
but notice that this will produce 0
or 1
, since the arithmetic is not floating point, but is integers. You could get floating point values by running the data through bc
with:
echo "$( cat my_file | cut -f 11 | awk '$1 < 5E-2' | wc -l)" / "$(cat my_file | cut -f 11 | wc -l)" | bc -l
Note that all of these UUOC should be removed (eg, with < my_file cut -f 11
) and cut | awk
is generally an anti-pattern. Just do the whole thing in awk
.
CodePudding user response:
I think you might be able to handle this all via awk
:
awk 'BEGIN {cnt=0} { if ($11<.05) cnt =1 } END {printf "%2.2f%%\n", cnt/NR*100}' my_file
CodePudding user response:
Using only awk
:
awk '$11 < 0.05 {c } END {print c}' my_file
CodePudding user response:
Here is an example of how to transform parts of your command into shorter equivalents:
cat my_file | cut -f 11 | wc -l
cat my_file | wc -l
wc -l < my_file
cat my_file | cut -f 11 | awk '$1 < 5E-2' | wc -l
cat my_file | awk -F'\t' '$11 < 5E-2' | wc -l
awk -F'\t' '$11 < 5E-2' my_file | wc -l
awk -F'\t' '$11 < 5E-2 {c } END {print c}' my_file
To divide the two results:
awk -F'\t' '$11 < 5E-2 {c } END {print c/NR}' my_file
0.666667
CodePudding user response:
Count precent of lines that pass AWK filter?
I would harness GNU AWK
for this task following way, let file.txt
content be
0.01
0.03
0.05
0.07
0.09
then
awk '{cnt =$1<0.05}END{print cnt/NR*100 "%"}' file.txt
gives output
40%
Explanation: comparison gives 0 or 1, so I use =
which increase by 0 when condition not met and increase by 1 when condition holds. After all lines processed I compute percentage simply by dividing cnt by NR (which is inside END is number of all lines) and multiply by 100
. Disclaimer: this solution assumes that file.txt
has no less than 1 line.
(tested in gawk 4.2.1)
CodePudding user response:
{m,g}awk '
BEGIN { ___=((_ = _ _)*_ (\
__=_*_ —-_) )^-!(_ -= _)
} { _ = ($__)<___
} END {
printf("\n\n\tFilter hit rate :: %.*f %% ( %\47.f / %\47.f )"\
" \n\n\t%*sFile :: %-.*s \n\n",
___=__--,__*_*__/(__=NR),_,__,
___,____,_^=_*=_ =_^=_<_, FILENAME) } ' my_file
|
Filter hit rate :: 0.04415484271 % ( 3,588 / 8,125,949 )
File :: myfile