Why isn't this gawk gensub() behaving like regex101?-CodePudding

I have a gawk script that includes this line:

$0 = gensub(/{\ \ (. ?)\ \ }/, "{\\\\textcolor{added}{\\1}", "g", $0);

On the following input line

- {  first phrase  } swiftly followed {  by a second one  }.

it produces:

- \textcolor{added}{first phrase  } swiftly followed {  by a second one}}

not what I'm expecting:

- \textcolor{added}{first phrase} swiftly followed \textcolor{added}{by a second one}}

When I run the same regex in regex101.com or in the Mac Expressions app, it works as expected. What am I missing?

CodePudding user response：

Notice which pairs of were removed:

# from this:-                   {  first phrase  } swiftly followed {  by a second one  }.
#   to this: - \textcolor{added}{  first phrase  } swiftly followed {  by a second one  }}
                                 ^^                                                   ^^

This confirms barmar's comment about awk being non-greedy with matches.

A couple small changes to the current code:

# current:
# $0 = gensub(/{\ \ (. ?)\ \ }/  , "{\\\\textcolor{added}{\\1}", "g", $0)

# new
  $0 = gensub(/{\ \ ([^ ]*)\ \ }/,  "\\\\textcolor{added}{\\1}", "g", $0)

Where:

replace . ? with [^ ] to implement a greedy match
removed leading { from replacement string as this doesn't show up in OP's expected output

Taking for a test drive:

echo '- {  first phrase  } swiftly followed {  by a second one  }.' |
awk '{$0 = gensub(/{\ \ ([^ ]*)\ \ }/, "\\\\textcolor{added}{\\1}", "g", $0)} 1'

This generates:

- \textcolor{added}{first phrase} swiftly followed \textcolor{added}{by a second one}.

CodePudding user response：

If you're using non- gnu-awk|gawk but wanna emulate a similar feature, something like this :

- {  first phrase  } swiftly followed {  by a second one  }.

mawk 'gsub(______, ___ "&" )       gsub(_____, __ "&" )     \
      gsub(__ "[^ (__ ___) "] " ___, (____)__  "&" ___)      \
      gsub((__)(__) _____)  "|" (___)(___) "[ ][ ]", _)      \
      gsub(__,   "\173  ")          gsub(___, "  \175")    1 ' FS='^$' \
              __='\6\31'  _____='[{][ ][ ]' ____='\134textcolor{added}{' \
             ___='\1\36' ______='[ ][ ][}]'

- \textcolor{added}{first phrase} swiftly followed \textcolor{added}{by a second one}.

Yes it's very verbose (unfortunate downside of it) -

I had to play it safe by double checking for isolated { or } without matching pair, and ensure that their original states be properly restored at the cleanup stage,
on top of wiping all remaining remnants of the temp SEP combo bytes in the [[:cntrl:]] region that were inserted in lieu of a costly array split.

ps : can't get this whiskey tango of an unwanted code bolding foxtrot of an issue to clear