awk select rows that has column 1 partially matching column 2-CodePudding

I have an "example_file" like below,

a2022.1 a2022.1 80
a2022.1 a2022.2 90
a2022.1 a2023.1 80
a2022.2 a2022.2 90
a2022.2 a2023.1 40
a2022.3 a2022.1 50
b20225.1 a2022.1 80

and I would like to select lines which has the third column > 80 and also has the 1st column (before the dot) different from the 2nd column (before the dot).

So the desired result would be

a2022.1 a2023.1 80
b20225.1 a2022.1 80

So I already managed to exclude the lines with 3rd column <80 and with the same 1st and 2nd column by using the line below

awk '($3 > 80 && $1!=$2)' example_file

and I tried awk '($3 > 80 && $1!~$2)' example_file, but it does not exclude the line

a2022.1 a2022.2 90

I thought it could be easy, but I can't figure it out. Could somebody help? Thanks!

CodePudding user response：

You may use this awk:

awk -F '[.[:blank:]]' '$NF >= 80 && $1 != $3' file

a2022.1 a2023.1 80
b20225.1 a2022.1 80

Here -F '[.[:blank:]]' sets input field separator to a dot or whitespace character. This allows us to use parts of first column as $1 and $2 and second column as $3 and $4.

Alternatively using gnu-awk:

awk '$3 >= 80 && 
    gensub(/\..*/, "", "1", $1) != gensub(/\..*/, "", "1", $2)' file

a2022.1 a2023.1 80
b20225.1 a2022.1 80

CodePudding user response：

1st solution: With your shown samples, please try following awk code. Using split function of awk here to get exact values from fields and compare them accordingly.

awk '
{
  split($1,arr1,".")
  split($2,arr2,".")
}
$3>=80 && arr1[1]!=arr2[1]
'  Input_file

2nd solution: With GNU awk using it's match function please try following. Using regex ^.([^.]*)\..\s .([^.]*)\..\s (\S ) for getting values as per requirement into the array(through capturing groups) and comparing them as per conditions in further condition.

awk '
match($0,/^.([^.]*)\..\s .([^.]*)\..\s (\S )/,arr) && (arr[3]>=80 && (arr[1]!=arr[2]))
'  Input_file

CodePudding user response：

Alternatively, you can use rquery (https://github.com/fuyuncat/rquery/releases) to do a straight forward query.

$./rq -q "p d/ /r | s @raw | f @3>=80 and substr(@1,0,instr(@1,'.'))!=substr(@2,0,instr(@2,'.'))" samples/example_file
a2022.1 a2023.1 80
b20225.1 a2022.1 80