Home > Back-end >  Regular expression for nested brackets that contain a symbol
Regular expression for nested brackets that contain a symbol

Time:03-24

I need to replace [with square brackets] only those parentheses that contain comma, no matter on which nesting level they are.

Example of a raw string:

start (one, two, three(*)), some text (1,2,3), and (4, 5(*)), another
(four), interesting (five (6, 7)), text (six($)), here is (seven)

Expected result:

start [one, two, three(*)], some text [1,2,3], and [4, 5(*)], another
(four), interesting (five [6, 7]), text (six($)), here is (seven)

The best I could do doesn't cope with parts with nested parentheses:

preg_replace('~ \( ( [^()]  (\([^,] \))? , [^()]  )  \) ~x', ' [$1]', $string);

// start (one, two, three(*)), some text [1,2,3], and (4, 5(*)), another (four), interesting (five [6, 7]), text (six($)), here is (seven)

CodePudding user response:

I would tokenise the input, splitting it by commas and parentheses, keeping also these delimiters as results. Then use a recursive algorithm to detect whether commas appear for a certain pair of parentheses and make the appropriate replacement.

Here is a function doing the job:

function replaceWithBrackets($s) {

    function recur(&$tokens) {
        $comma = false;
        $replaced = "";
        while (true) {
            $token = current($tokens);
            next($tokens);
            if ($token == ")" || $token === false) break; 
            if ($token == "(") {
                [$substr, $subcomma] = recur($tokens);
                $replaced .= $subcomma ? "[$substr]" : "($substr)";
            } else {
                $comma = $comma || $token == ",";
                $replaced .= $token;
            }
        }
        return [$replaced, $comma];
    }
    
    $tokens = preg_split("~([(),])~", $s, 0, PREG_SPLIT_DELIM_CAPTURE);
    return recur($tokens)[0];
}

CodePudding user response:

Ok, this is not regular expression, but, in case you don't find a regular expression, next alghoritm is your plan B, plenty of comments (it might be useful for someone, and that's what StackOverflow is for) :

$str = "start (one, two, three(\*)), some text (1,2,3), and (4, 5(*)), another " .
       "(four), interesting (five (6, 7)), text (six($)), here is (seven)";
echo $str . "<br/>";

$PARs = array(); // ◄■ POSITIONS OF FOUND "(".
$prev = false; // ◄■ FLAG : TRUE = THERE WAS A PREVIOUS COMMA.
$coma = false; // ◄■ FLAG : TRUE = THERE IS A COMMA INSIDE CURRENT "()".
for ( $i = 0; $i < strlen( $str ); $i   )
  switch ( $str[ $i ] )
  {
     case "(" : array_push( $PARs, $i ); // ◄■ POSITION OF "(".
                if ( $coma )
                   $prev = true;
                $coma = false; // ◄■ TRUE ONLY IF "," IS FOUND BEFORE ")".
                break;
     case ")" : $pos = array_pop( $PARs ); // ◄■ POSITION OF PREVIOUS "(".
                if ( $coma ) // ◄■ IF THERE WAS COMMA IN CURRENT "()"...
                     {
                       $str[ $pos ] = "["; // ◄■ REPLACE "(".
                       $str[ $i ] = "]"; // ◄■ REPLACE ")".
                       $coma = false; // ◄■ CLEAR FLAG.
                     }
                elseif ( $prev ) // ◄■ IF THERE WAS NO COMMA IN CURRENT "()"
                     {           //    BUT THERE WAS COMMA IN OUTSIDE "()"...
                       $prev = false; // ◄■ CLEAR FLAG.
                       $coma = true; // ◄■ SET FLAG.
                     }
                break;
     case "," : if ( ! empty( $PARs ) ) // ◄■ IGNORE COMMAS IF NOT IN "()".
                   $coma = true;
                break;
  }

if ( $coma )
   {
     $str[ $pos ] = "["; // ◄■ REPLACE "(".
     $str[ $i-1 ] = "]"; // ◄■ REPLACE ")".
   }

echo $str . // ◄■ RESULT.
     // COMPARE WITH EXPECTED ▼
     "<br/>start [one, two, three(\*)], some text [1,2,3], and [4, 5(*)], another " .
     "(four), interesting (five [6, 7]), text (six($)), here is (seven)";

For deeper levels I think $prev should be turned into an array.

Edit : fixed bug found by @trincot (thanks).

  • Related