Home > Software design >  regex non-fixed-width lookaround
regex non-fixed-width lookaround

Time:10-05

I am trying to match curly quotes that are inside shortcodes and replace them with normal quotes but leave the ones outside.

Here is an example content:

“foobar” [tagname id=“1035” linked=“true”] “another” [tagname id=“1”]

Should output the following:

“foobar” [tagname id="1035" linked="true"] “another” [tagname id="1"]

It can be PCRE or Javascript regex. Any suggestions is appreciated.

CodePudding user response:

For doing replacements on substrings that match some pattern it's often more efficient and comfortable to use a callback if available. With PHP and preg_replace_callback e.g.:

$res = preg_replace_callback('~\[[^\]\[]*\]~', function($m) {
  return str_replace(['“','”'], '"', $m[0]);
}, $str);

This pattern matches an opening square bracket followed by any amount of characters that are no square brackets, followed by a closing square bracket. The callback function replaces quotes.

Here is a PHP demo at tio.run. This can easily be translated to JS with replace function (demo).

let res = str.replace(/\[[^\]\[]*\]/g, m => { return m.replace(/[“”]/g,'"'); });

Without callback in PCRE/PHP also the \G anchor can be used to continue where the previous match ended. To chain matches to an opening square bracket (without checking for a closing).

$res = preg_replace('~(?:\G(?!^)|\[)[^“”\]\[]*\K[“”]~u', '"', $str);

See this demo at regex101 or another PHP demo at tio.run

(?!^) prevents \G from matching at start (default). \K resets beginning of the reported match.


To have it mentioned, another method could be to use a lookahead at each for checking if there is a closing ] ahead without any other square brackets in between: [“”](?=[^\]\[]*\])
This does not check for an opening [ and works in all regex flavors that support lookaheads.

CodePudding user response:

Since this is a little tricky, I am contributing from my end.

So, we can,

  • match strings that follow a format of =“some_chars”

  • Since you have an additional constraint of match only if they are inside the square brackets, we will use positive lookahead ?= to match the above only if it is followed by a closing square bracket (since the string is uniformly formed, there will always be an opening square bracket which we won't bother about).

Snippet:

<?php

$str = "“foobar” [tagname id=“1035” linked=“true”] “another” [tagname id=“1”]";

echo preg_replace('/(\=“([^”]*)”)(?=.*\])/', '="${2}"', $str);

Online Demo

  • Related