I have a file of targets predicted by Diana and I would like to extract those with values over 0.70
>AAGACAACGUUUAAACCA|ENST00000367816|0.999999999975474
UTR3 693-701 0.00499294596715397
UTR3 1045-1053 0.405016433077734
>AAGACAACGUUUAAACCA|ENST00000392971|0.996695852735028
CDS 87-95 0.0112208345874892
I don't know why this script doesn't want to work if it seems to be correct
for file in SC*
do
grep ">" $file | awk 'BEGIN{FS="|"}{if($3 >= 0.70)}{print $2, $3}' > 70/$file.tab
done
The issue is it doesn't filter, can you help me to find out the error?
CodePudding user response:
For a start, that's not a valid awk
script since you have a misplaced }
character:
BEGIN{FS="|"}{if($3 >= 0.70)}{print $2, $3}
# |
# -------------
# move here |
# V
BEGIN{FS="|"}{if($3 >= 0.70){print $2, $3}}
You also don't need grep
because awk
can do that itself, and you can also set the field separator without a BEGIN
block. For example, here's a command that will output field 3 values greater than 0.997
, on lines starting with >
(using |
as a field separator):
pax> awk -F\| '/^>/ && $3 > 0.997 { print $3 }' prog.in
0.999999999975474
I chose 0.997
to ensure one of the lines in your input file was filtered out for being too low (as proof that it works). For your desired behaviour, the command would be:
pax> awk -F\| '/^>/ && $3 > 0.7 { print $2, $3 }' prog.in
ENST00000367816 0.999999999975474
ENST00000392971 0.996695852735028
Keep in mind I've used
> 0.7
as per your "values over 0.70" in the heading and text of your question. If you really mean "values 0.70 and above" as per the code in your question, simply change>
into>=
.
CodePudding user response:
Looks like you are running a for loop to kick off awk
program multiple times(it means each time a file processes an awk
program process will be kicked off), you need not to do that, awk
program could read all the files with same name/format by itself, so apart from fixing your typo in awk
program pass all files into your awk
program too like:
awk -F\| 'FNR==1{close(out); out="70/"FILENAME".tab"} /^>/ && $3 > 0.7 { print $2,$3 > out }' SC*
CodePudding user response:
i think it's perhaps safe to regex filter in string mode, instead of numerically :
$3 !~/0[.][0-6]/
if it started to interpret the input as a number, and does a numeric compare, that would be subject to rounding errors limited to float-point math. with a string-based filter, you could avoid values above
~ 0 . 699 999 999 999 999 95559107901… (approx. IEEE754 double-precision of 7E-1 )
being rounded up.