I need to replace [with square brackets] only those parentheses that contain comma, no matter on which nesting level they are.
Example of a raw string:
start (one, two, three(*)), some text (1,2,3), and (4, 5(*)), another
(four), interesting (five (6, 7)), text (six($)), here is (seven)
Expected result:
start [one, two, three(*)], some text [1,2,3], and [4, 5(*)], another
(four), interesting (five [6, 7]), text (six($)), here is (seven)
The best I could do doesn't cope with parts with nested parentheses:
preg_replace('~ \( ( [^()] (\([^,] \))? , [^()] ) \) ~x', ' [$1]', $string);
// start (one, two, three(*)), some text [1,2,3], and (4, 5(*)), another (four), interesting (five [6, 7]), text (six($)), here is (seven)
CodePudding user response:
I would tokenise the input, splitting it by commas and parentheses, keeping also these delimiters as results. Then use a recursive algorithm to detect whether commas appear for a certain pair of parentheses and make the appropriate replacement.
Here is a function doing the job:
function replaceWithBrackets($s) {
function recur(&$tokens) {
$comma = false;
$replaced = "";
while (true) {
$token = current($tokens);
next($tokens);
if ($token == ")" || $token === false) break;
if ($token == "(") {
[$substr, $subcomma] = recur($tokens);
$replaced .= $subcomma ? "[$substr]" : "($substr)";
} else {
$comma = $comma || $token == ",";
$replaced .= $token;
}
}
return [$replaced, $comma];
}
$tokens = preg_split("~([(),])~", $s, 0, PREG_SPLIT_DELIM_CAPTURE);
return recur($tokens)[0];
}
CodePudding user response:
Ok, this is not regular expression, but, in case you don't find a regular expression, next alghoritm is your plan B, plenty of comments (it might be useful for someone, and that's what StackOverflow is for) :
$str = "start (one, two, three(\*)), some text (1,2,3), and (4, 5(*)), another " .
"(four), interesting (five (6, 7)), text (six($)), here is (seven)";
echo $str . "<br/>";
$PARs = array(); // ◄■ POSITIONS OF FOUND "(".
$prev = false; // ◄■ FLAG : TRUE = THERE WAS A PREVIOUS COMMA.
$coma = false; // ◄■ FLAG : TRUE = THERE IS A COMMA INSIDE CURRENT "()".
for ( $i = 0; $i < strlen( $str ); $i )
switch ( $str[ $i ] )
{
case "(" : array_push( $PARs, $i ); // ◄■ POSITION OF "(".
if ( $coma )
$prev = true;
$coma = false; // ◄■ TRUE ONLY IF "," IS FOUND BEFORE ")".
break;
case ")" : $pos = array_pop( $PARs ); // ◄■ POSITION OF PREVIOUS "(".
if ( $coma ) // ◄■ IF THERE WAS COMMA IN CURRENT "()"...
{
$str[ $pos ] = "["; // ◄■ REPLACE "(".
$str[ $i ] = "]"; // ◄■ REPLACE ")".
$coma = false; // ◄■ CLEAR FLAG.
}
elseif ( $prev ) // ◄■ IF THERE WAS NO COMMA IN CURRENT "()"
{ // BUT THERE WAS COMMA IN OUTSIDE "()"...
$prev = false; // ◄■ CLEAR FLAG.
$coma = true; // ◄■ SET FLAG.
}
break;
case "," : if ( ! empty( $PARs ) ) // ◄■ IGNORE COMMAS IF NOT IN "()".
$coma = true;
break;
}
if ( $coma )
{
$str[ $pos ] = "["; // ◄■ REPLACE "(".
$str[ $i-1 ] = "]"; // ◄■ REPLACE ")".
}
echo $str . // ◄■ RESULT.
// COMPARE WITH EXPECTED ▼
"<br/>start [one, two, three(\*)], some text [1,2,3], and [4, 5(*)], another " .
"(four), interesting (five [6, 7]), text (six($)), here is (seven)";
For deeper levels I think $prev
should be turned into an array.
Edit : fixed bug found by @trincot (thanks).