I was wondering if there a generic way to extract a specific string which by design is an eleven characters alphanumeric string using awk approach? for ex-
cat ext.txt
This is a sample field where the code is MGTCBEBEECL for NR
This is a sample field where the code is MGTCBEBEE01 for NR
This field must be 030 when Rule_1 = 'FR' and Rule_2 is 'EUROFRANSBI' or 'EURO_NEAR' and code is PARBFRPPXXX
This field must be 0186 when Rule_1 = 'FR' and Rule_2 is 'EUROFRANSBI' or 'EURO_NEAR' and code is CITIFRPPXXX for the NR
For NFNC with Rule_1 is CA and Rule_2 is Universal and business code is null and official code must be 'CIBCCATTXXX'
I want to only extract the codes:-
MGTCBEBEECL
MGTCBEBEE01
PARBFRPPXXX
CITIFRPPXXX
CIBCCATTXXX
There are almost 100 such lines from which i am hoping to extract these distinct strings, but i am at my wits end how to make it more generic and non-redundant hence seeking this community's assistance!
CodePudding user response:
With the current examples you can do it with grep
like this:
<ext.txt grep -oE "(code is|code must be) '?[A-Z0-9]{11}'?" |
tr -d "'" |
grep -o '[^ ]*$'
Output:
MGTCBEBEECL
MGTCBEBEE01
PARBFRPPXXX
CITIFRPPXXX
CIBCCATTXXX
CodePudding user response:
Using gawk:
gawk -F "[ ']" 'BEGIN{ r=@/[A-Z]{11}/ }r{ for (i=1; i<=NF;i ){ if($i~r) print $i} }' ext.txt
-F "[ ']"
use space or'
as field separator (to also find codes like'CIBCCATTXXX'
)r=@/[A-Z]{11}/
assign the used regular expression (because it's used twice in the scriptfor(...
loop over all the field in a line, and print the field when it matches the regular expression.
output:
MGTCBEBEECL
EUROFRANSBI
PARBFRPPXXX
EUROFRANSBI
CITIFRPPXXX
CIBCCATTXXX
CodePudding user response:
There is a way with GNU awk
using FPAT:
awk -v FPAT='[[:alnum:]]{11}' '{print $NF}' file
MGTCBEBEECL
MGTCBEBEE01
PARBFRPPXXX
CITIFRPPXXX
CIBCCATTXXX
- Setting the FPAT as
'[[:alnum:]]{11}'
GNU awk can handle fields that contain alphanumeric string with eleven characters. - and
{print $NF}
for printing the desired fields.