I want to do a string replacement on any string that is surrounded by a word boundary that is alphanumeric and is 14 characters long. The string must contain at least one capitalized letter and one number. I know (I think I know) that I'll need to use positive look ahead for the capitalized letter and number. I am sure that I have the right regex pattern. What I don't understand is why sed
is not matching. I have used online tools to validate the pattern like regexpal etc. Within those tools, I am matching the string like I expect.
Here is the regex and sed
command I'm using.
\b(?=.*[A-Z])(?=.*[0-9])[a-zA-Z0-9]{14}\b
The sed
command I'm testing with is
echo "asdfASDF1234ds" | sed 's/\b(?=.*[A-Z])(?=.*[0-9])[a-zA-Z0-9]{14}\b/NEW_STRING/g'
I would expect this to match on the echoed string.
CodePudding user response:
sed
doesn't support lookaheads, or many many many other modern regex Perlisms. The simple fix is to use Perl.
perl -pe 's/\b(?=.*[A-Z])(?=.*[0-9])[a-zA-Z0-9]{14}\b/NEW_STRING/g' <<< "asdfASDF1234ds"
CodePudding user response:
sed
understands a very limited form of regex. It does not have lookahead.
Using a tool with more powerful regex support is the simple solution.
If you must use sed
, you could do something like:
$ sed '
# mark delimiters
s/[^a-zA-Z0-9]\{1,\}/\n&\n/g
s/^[^\n]/\n&/
s/[^\n]$/&\n/
# mark 14-character candidates
s/\n[a-zA-Z0-9]\{14\}\n/\n&\n/g
# mark if candidate contains capital
s/\n\n[^\n]*[A-Z][^\n]*\n\n/\n&\n/g
# check for a digit; if found, replace
s/\n\n\n[^\n]*[0-9][^\n]*\n\n\n/NEW_STRING/g
# remove marks
s/\n//g
' <<'EOD'
a234567890123n
,a234567890123n,
xx,a234567890123n,yy
a23456789A123n
XX,a23456789A123n,YY
xx,a23456789A1234n,yy
EOD
a234567890123n
,a234567890123n,
xx,a234567890123n,yy
NEW_STRING
XX,NEW_STRING,YY
xx,a23456789A1234n,yy
$