I have a gawk script that includes this line:
$0 = gensub(/{\ \ (. ?)\ \ }/, "{\\\\textcolor{added}{\\1}", "g", $0);
On the following input line
- { first phrase } swiftly followed { by a second one }.
it produces:
- \textcolor{added}{first phrase } swiftly followed { by a second one}}
not what I'm expecting:
- \textcolor{added}{first phrase} swiftly followed \textcolor{added}{by a second one}}
When I run the same regex in regex101.com or in the Mac Expressions app, it works as expected. What am I missing?
CodePudding user response:
Notice which pairs of
were removed:
# from this:- { first phrase } swiftly followed { by a second one }.
# to this: - \textcolor{added}{ first phrase } swiftly followed { by a second one }}
^^ ^^
This confirms barmar's comment about awk
being non-greedy with matches.
A couple small changes to the current code:
# current:
# $0 = gensub(/{\ \ (. ?)\ \ }/ , "{\\\\textcolor{added}{\\1}", "g", $0)
# new
$0 = gensub(/{\ \ ([^ ]*)\ \ }/, "\\\\textcolor{added}{\\1}", "g", $0)
Where:
- replace
. ?
with[^ ]
to implement a greedy match - removed leading
{
from replacement string as this doesn't show up in OP's expected output
Taking for a test drive:
echo '- { first phrase } swiftly followed { by a second one }.' |
awk '{$0 = gensub(/{\ \ ([^ ]*)\ \ }/, "\\\\textcolor{added}{\\1}", "g", $0)} 1'
This generates:
- \textcolor{added}{first phrase} swiftly followed \textcolor{added}{by a second one}.
CodePudding user response:
If you're using non- gnu-awk|gawk
but wanna emulate a similar feature, something like this :
- { first phrase } swiftly followed { by a second one }.
.
mawk 'gsub(______, ___ "&" ) gsub(_____, __ "&" ) \
gsub(__ "[^ (__ ___) "] " ___, (____)__ "&" ___) \
gsub((__)(__) _____) "|" (___)(___) "[ ][ ]", _) \
gsub(__, "\173 ") gsub(___, " \175") 1 ' FS='^$' \
__='\6\31' _____='[{][ ][ ]' ____='\134textcolor{added}{' \
___='\1\36' ______='[ ][ ][}]'
- \textcolor{added}{first phrase} swiftly followed \textcolor{added}{by a second one}.
Yes it's very verbose (unfortunate downside of it) -
I had to play it safe by double checking for isolated
{
or}
without matching pair, and ensure that their original states be properly restored at the cleanup stage,on top of wiping all remaining remnants of the temp
SEP
combo bytes in the[[:cntrl:]]
region that were inserted in lieu of a costly array split.
ps : can't get this whiskey tango of an unwanted code bolding foxtrot of an issue to clear