Home > Software design >  extract specific row with numbers over N
extract specific row with numbers over N

Time:11-10

I have a dataframe like this

1  3  MAPQ=0;CT=3to5;SRMAPQ=60
2  34  MAPQ=60;CT=3to5;SRMAPQ=67
4  56  MAPQ=67;CT=3to5;SRMAPQ=50
5  7  MAPQ=44;CT=3to5;SRMAPQ=61

with using awk (or others)

I want to extract rows with only SRMAPQ over 60.

This means the output is

2  34  MAPQ=60;CT=3to5;SRMAPQ=67
5  7  MAPQ=44;CT=3to5;SRMAPQ=61

update: "SRMAPQ=60" can be anywhere in the line, MAPQ=44;CT=3to5;SRMAPQ=61;DT=3to5

CodePudding user response:

You don't have to extract the value out of SRMAPQ separately and do the comparison. If the format is fixed like above, just use = as the field separator and access the last field using $NF

awk -F= '$NF > 60' file

Or if SRMAPQ can occur anywhere in the line (as updated in the comments), use a generic approach

awk 'match($0, /SRMAPQ=([0-9] )/){ l = length("SRMAPQ="); v = substr($0, RSTART l, RLENGTH-l) } v > 60' file
  • Related