I have a couple of "shortcode" blocks in a text, which I want to replace with some HTML entities on the fly using preg_replace_callback.
The syntax of a shortcode is simple:
[block:type-of-the-block attribute-name1:value attribute-name2:value ...]
Attributes with values may be provided in any order. Sample regex pattern I use to find these shortcode blocks:
/\[
(?:block:(?<block>piechart))
(?:
(?:\s value:(?<value>[0-9] )) |
(?:\s stroke:(?<stroke>[0-9] )) |
(?:\s angle:(?<angle>[0-9] )) |
(?:\s colorset:(?<colorset>reds|yellows|blues))
)*
\]/xumi
Now, here comes the funny thing: PHP matches non-existent named groups. For a string like this:
[block:piechart colorset:reds value:20]
...the resulting $matches array is (note the empty strings in "stroke" and "angle"):
array(11) {
[0]=>
string(39) "[block:piechart colorset:reds value:20]"
["block"]=>
string(8) "piechart"
[1]=>
string(8) "piechart"
["value"]=>
string(2) "20"
[2]=>
string(2) "20"
["stroke"]=>
string(0) ""
[3]=>
string(0) ""
["angle"]=>
string(0) ""
[4]=>
string(0) ""
["colorset"]=>
string(4) "reds"
[5]=>
string(4) "reds"
}
Here's the code for testing (you can execute it online here as well: https://onlinephp.io/c/2429a):
$pattern = "
/\[
(?:block:(?<block>piechart))
(?:
(?:\s value:(?<value>[0-9] )) |
(?:\s stroke:(?<stroke>[0-9] )) |
(?:\s angle:(?<angle>[0-9] )) |
(?:\s colorset:(?<colorset>reds|yellows|blues))
)*
\]/xumi";
$subject = "here is a block to be replaced [block:piechart value:25 angle:720] [block] and another one [block:piechart colorset:reds value:20]";
preg_replace_callback($pattern, 'callbackFunction', $subject);
function callbackFunction($matches)
{
var_dump($matches);
// process matched values, return some replacement...
$replacement = "...";
return $replacement;
};
Is it normal that PHP creates empty entries in $matches array, just in case of a match, but doesn't clean it up when no actual match is found? What am I doing wrong? How to prevent PHP from creating these false entries, which simply shouldn't be there?
Any help or explanation would be deeply appreciated! Thanks!
CodePudding user response:
This behaviour is as expected, although not well documented. In the manual under "Subpatterns":
When the whole pattern matches, that portion of the subject string that matched the subpattern is passed back to the caller
and:
Consider the following regex matched against the string Sunday:
(?:(Sat)ur|(Sun))day
Here Sun is stored in backreference 2, while backreference 1 is empty
and also in the documentation of the PREG_UNMATCHED_AS_NULL
flag (new as of version 7.2.0). From the manual:
If this flag is passed, unmatched subpatterns are reported as null; otherwise they are reported as an empty string.
Which then gives you a way to work around this behaviour:
preg_replace_callback($pattern, 'callbackFunction', $subject, -1, $count, PREG_UNMATCHED_AS_NULL);
If you take this approach then in your callback you could filter the $matches
array using array_filter
to remove the NULL
values.
$matches = array_filter($matches, function ($v) { return !is_null($v); }))