Home > Back-end >  How to build a command to filter an interval using -grep on Linux
How to build a command to filter an interval using -grep on Linux

Time:07-29

I have a set of data that looks like this:

NK.Chr1:75500000-95000000:28960-29007   NG-unitig0655   97.872  47  1   0   1   47  121009  120963  2.90e-14    80.6
NK.Chr1:75500000-95000000:28960-29007   NG-1DRT-unitig0549  97.872  47  1   0   1   47  623680  623726  2.90e-14    80.6
NK.Chr1:75500000-95000000:28960-29007   NG-1DRT-unitig0278  97.872  47  1   0   1   47  1224581 1224627 2.90e-14    80.6
NK.Chr1:75500000-95000000:28960-29007   NG-1DRT-Chr4    97.872  47  1   0   1   47  8416368 8416414 2.90e-14    80.6
NK.Chr1:75500000-95000000:28960-29007   NG-1DRT-Chr4    97.872  47  1   0   1   47  20041035    20041081    2.90e-14    80.6
NK.Chr1:75500000-95000000:28960-29007   NG-1DRT-Chr4    97.872  47  1   0   1   47  35175472    35175426    2.90e-14    80.6
NK.Chr1:75500000-95000000:28960-29007   NG-1DRT-Chr4    97.872  47  1   0   1   47  56460095    56460049    2.90e-14    80.6

I need to filter the lines in the range of 0-3900000, considering only the numbers before NG.

grep 'NK.Chr1:75500000-95000000:[0-3900000]' NG.1DRT-blast.out > chr1-blast-NG.txt

I tried this code, but it returned all the lines with NK.Chr1:75500000-95000000, not considering the range.

Anyone knows how to build a proper code for it?

CodePudding user response:

With your shown samples and attempts please try following awk code. Written and tested in GNU awk.

awk 'match($0,/NK.Chr1:75500000-95000000:([0-9] )-([0-9] )[[:space:]] NG/,arr) && (arr[1] arr[2]) 0<=3900000' Input_file

Explanation: Using match function of awk here, where using regex like: NK.Chr1:75500000-95000000:([0-9] )-([0-9] )[[:space:]] NG where its creating 2 capturing groups whose values are further getting stored into array named arr. Then addition to match adding an AND condition if value of digits(by removing - between them) is lesser OR equals to 3900000 then print that line.

  • Related