I have a dataframe like this
1 3 MAPQ=0;CT=3to5;SRMAPQ=60
2 34 MAPQ=60;CT=3to5;SRMAPQ=67
4 56 MAPQ=67;CT=3to5;SRMAPQ=50
5 7 MAPQ=44;CT=3to5;SRMAPQ=61
with using awk (or others)
I want to extract rows with only SRMAPQ over 60.
This means the output is
2 34 MAPQ=60;CT=3to5;SRMAPQ=67
5 7 MAPQ=44;CT=3to5;SRMAPQ=61
update: "SRMAPQ=60" can be anywhere in the line, MAPQ=44;CT=3to5;SRMAPQ=61;DT=3to5
CodePudding user response:
You don't have to extract the value out of SRMAPQ
separately and do the comparison. If the format is fixed like above, just use =
as the field separator and access the last field using $NF
awk -F= '$NF > 60' file
Or if SRMAPQ
can occur anywhere in the line (as updated in the comments), use a generic approach
awk 'match($0, /SRMAPQ=([0-9] )/){ l = length("SRMAPQ="); v = substr($0, RSTART l, RLENGTH-l) } v > 60' file