Home > Back-end >  Match substring of column 2 with column 1 using awk
Match substring of column 2 with column 1 using awk

Time:06-18

How to check if the 2 chars at the beginning of a record in column 1 matches with the 5th & 6th character of a string in column 2 ? I've tried one approach where I make use of substr but as I am new to bash scripting I am not sure how to extract middle characters.

my code

awk 'BEGIN{OFS=FS="|"} { $2!="" str=substr($2, length($2) -7,9) 
if ( $1 ~ /^str/) print}' file 

cat file

CZ987876654534|HDFCCZPXXXX|Czech Republic|1243765
9785764654|HDFCCZPXXXX|United Kingdom|84320
LU987876986576|BSUILUPXXXX|Australia|8765
YZ654S|BSUIDEPXXXX|Germany|98744
QA76465346||Qatar|9877654
GB875765||Europe|98679867

expected output :

CZ987876654534|HDFCCZPXXXX|Czech Republic|1243765
LU987876986576|BSUILUPXXXX|Australia|8765

note - length of $2 column is always 11 as it is a BIC.

CodePudding user response:

One awk idea:

$ awk -F'|' 'substr($1,1,2) == substr($2,5,2)' file
CZ987876654534|HDFCCZPXXXX|Czech Republic|1243765
LU987876986576|BSUILUPXXXX|Australia|8765

If the two substr() calls generate the same pattern then the test evaulates as 'true' and the current line of input is passed to stdout (ie, the current line of input is printed).

CodePudding user response:

Using sed

$ sed -n '/\(..\)[^|]*|....\1/p' input_file
CZ987876654534|HDFCCZPXXXX|Czech Republic|1243765
LU987876986576|BSUILUPXXXX|Australia|8765

CodePudding user response:

Using match function in GNU awk.

awk 'match($0,/^(..)[^|]*\|.{4}(..)/,arr) && arr[1] == arr[2]' Input_file

Explanation: Simple explanation would be, using match function of awk, where matching regex ^(..)[^|]*\|.{4}(..)(which is explained below completely and creates 2 capturing group in array arr; which creates 2 elements of arr). Along with that(with && condition) checking condition if 1st element of arr is equal to 2nd element of arr then print that line(actually print is not mentioned, since awk works on method of condition/regexp and action and when a condition is met and no action is mentioned then printing current action will happen).

Explanation of regex:

^(..)       ##From starting of line matching any 2 characters and keeping then in 1st capturing group.
[^|]*\|.{4} ##Matching everything before 1st occurrence of | followed by | and 4 any characters.
(..)        ##Creating 2nd capturing group which captures any 2 characters in it.
  • Related