Home > Enterprise >  PHP preg_replace_callback creates false entries in matches for named groups
PHP preg_replace_callback creates false entries in matches for named groups

Time:07-07

I have a couple of "shortcode" blocks in a text, which I want to replace with some HTML entities on the fly using preg_replace_callback.

The syntax of a shortcode is simple:

[block:type-of-the-block attribute-name1:value attribute-name2:value ...]

Attributes with values may be provided in any order. Sample regex pattern I use to find these shortcode blocks:

/\[
    (?:block:(?<block>piechart))
    (?:
        (?:\s value:(?<value>[0-9] )) |
        (?:\s stroke:(?<stroke>[0-9] )) |
        (?:\s angle:(?<angle>[0-9] )) |
        (?:\s colorset:(?<colorset>reds|yellows|blues))
    )*
\]/xumi

Now, here comes the funny thing: PHP matches non-existent named groups. For a string like this:

[block:piechart colorset:reds value:20]

...the resulting $matches array is (note the empty strings in "stroke" and "angle"):

array(11) {
  [0]=>
  string(39) "[block:piechart colorset:reds value:20]"
  ["block"]=>
  string(8) "piechart"
  [1]=>
  string(8) "piechart"
  ["value"]=>
  string(2) "20"
  [2]=>
  string(2) "20"
  ["stroke"]=>
  string(0) ""
  [3]=>
  string(0) ""
  ["angle"]=>
  string(0) ""
  [4]=>
  string(0) ""
  ["colorset"]=>
  string(4) "reds"
  [5]=>
  string(4) "reds"
}

Here's the code for testing (you can execute it online here as well: https://onlinephp.io/c/2429a):

$pattern = "
/\[
    (?:block:(?<block>piechart))
    (?:
        (?:\s value:(?<value>[0-9] )) |
        (?:\s stroke:(?<stroke>[0-9] )) |
        (?:\s angle:(?<angle>[0-9] )) |
        (?:\s colorset:(?<colorset>reds|yellows|blues))
    )*
\]/xumi";
$subject = "here is a block to be replaced [block:piechart value:25   angle:720]  [block] and another one [block:piechart colorset:reds value:20]";
preg_replace_callback($pattern, 'callbackFunction', $subject);

function callbackFunction($matches)
{
    var_dump($matches);

    // process matched values, return some replacement...
    $replacement = "...";

    return $replacement;
};

Is it normal that PHP creates empty entries in $matches array, just in case of a match, but doesn't clean it up when no actual match is found? What am I doing wrong? How to prevent PHP from creating these false entries, which simply shouldn't be there?

Any help or explanation would be deeply appreciated! Thanks!

CodePudding user response:

This behaviour is as expected, although not well documented. In the manual under "Subpatterns":

When the whole pattern matches, that portion of the subject string that matched the subpattern is passed back to the caller

and:

Consider the following regex matched against the string Sunday:

(?:(Sat)ur|(Sun))day

Here Sun is stored in backreference 2, while backreference 1 is empty

and also in the documentation of the PREG_UNMATCHED_AS_NULL flag (new as of version 7.2.0). From the manual:

If this flag is passed, unmatched subpatterns are reported as null; otherwise they are reported as an empty string.

Which then gives you a way to work around this behaviour:

preg_replace_callback($pattern, 'callbackFunction', $subject, -1, $count, PREG_UNMATCHED_AS_NULL);

If you take this approach then in your callback you could filter the $matches array using array_filter to remove the NULL values.

$matches = array_filter($matches, function ($v) { return !is_null($v); }))

Demo on 3v4l.org

  • Related