How to check if the 2 chars at the beginning of a record in column 1 matches with the 5th & 6th character of a string in column 2 ? I've tried one approach where I make use of substr but as I am new to bash scripting I am not sure how to extract middle characters.
my code
awk 'BEGIN{OFS=FS="|"} { $2!="" str=substr($2, length($2) -7,9)
if ( $1 ~ /^str/) print}' file
cat file
CZ987876654534|HDFCCZPXXXX|Czech Republic|1243765
9785764654|HDFCCZPXXXX|United Kingdom|84320
LU987876986576|BSUILUPXXXX|Australia|8765
YZ654S|BSUIDEPXXXX|Germany|98744
QA76465346||Qatar|9877654
GB875765||Europe|98679867
expected output :
CZ987876654534|HDFCCZPXXXX|Czech Republic|1243765
LU987876986576|BSUILUPXXXX|Australia|8765
note - length of $2 column is always 11 as it is a BIC.
CodePudding user response:
One awk
idea:
$ awk -F'|' 'substr($1,1,2) == substr($2,5,2)' file
CZ987876654534|HDFCCZPXXXX|Czech Republic|1243765
LU987876986576|BSUILUPXXXX|Australia|8765
If the two substr()
calls generate the same pattern then the test evaulates as 'true' and the current line of input is passed to stdout (ie, the current line of input is printed).
CodePudding user response:
Using sed
$ sed -n '/\(..\)[^|]*|....\1/p' input_file
CZ987876654534|HDFCCZPXXXX|Czech Republic|1243765
LU987876986576|BSUILUPXXXX|Australia|8765
CodePudding user response:
Using match
function in GNU awk
.
awk 'match($0,/^(..)[^|]*\|.{4}(..)/,arr) && arr[1] == arr[2]' Input_file
Explanation: Simple explanation would be, using match
function of awk
, where matching regex ^(..)[^|]*\|.{4}(..)
(which is explained below completely and creates 2 capturing group in array arr; which creates 2 elements of arr). Along with that(with &&
condition) checking condition if 1st element of arr is equal to 2nd element of arr then print that line(actually print is not mentioned, since awk
works on method of condition/regexp and action and when a condition is met and no action is mentioned then printing current action will happen).
Explanation of regex:
^(..) ##From starting of line matching any 2 characters and keeping then in 1st capturing group.
[^|]*\|.{4} ##Matching everything before 1st occurrence of | followed by | and 4 any characters.
(..) ##Creating 2nd capturing group which captures any 2 characters in it.