I have a set of data that looks like this:
NK.Chr1:75500000-95000000:28960-29007 NG-unitig0655 97.872 47 1 0 1 47 121009 120963 2.90e-14 80.6
NK.Chr1:75500000-95000000:28960-29007 NG-1DRT-unitig0549 97.872 47 1 0 1 47 623680 623726 2.90e-14 80.6
NK.Chr1:75500000-95000000:28960-29007 NG-1DRT-unitig0278 97.872 47 1 0 1 47 1224581 1224627 2.90e-14 80.6
NK.Chr1:75500000-95000000:28960-29007 NG-1DRT-Chr4 97.872 47 1 0 1 47 8416368 8416414 2.90e-14 80.6
NK.Chr1:75500000-95000000:28960-29007 NG-1DRT-Chr4 97.872 47 1 0 1 47 20041035 20041081 2.90e-14 80.6
NK.Chr1:75500000-95000000:28960-29007 NG-1DRT-Chr4 97.872 47 1 0 1 47 35175472 35175426 2.90e-14 80.6
NK.Chr1:75500000-95000000:28960-29007 NG-1DRT-Chr4 97.872 47 1 0 1 47 56460095 56460049 2.90e-14 80.6
I need to filter the lines in the range of 0-3900000, considering only the numbers before NG.
grep 'NK.Chr1:75500000-95000000:[0-3900000]' NG.1DRT-blast.out > chr1-blast-NG.txt
I tried this code, but it returned all the lines with NK.Chr1:75500000-95000000
, not considering the range.
Anyone knows how to build a proper code for it?
CodePudding user response:
With your shown samples and attempts please try following awk
code. Written and tested in GNU awk
.
awk 'match($0,/NK.Chr1:75500000-95000000:([0-9] )-([0-9] )[[:space:]] NG/,arr) && (arr[1] arr[2]) 0<=3900000' Input_file
Explanation: Using match
function of awk
here, where using regex like: NK.Chr1:75500000-95000000:([0-9] )-([0-9] )[[:space:]] NG
where its creating 2 capturing groups whose values are further getting stored into array named arr. Then addition to match
adding an AND condition if value of digits(by removing -
between them) is lesser OR equals to 3900000 then print that line.