In PHP I'd like to be able to limit the number of characters around a word I'm trying to match. Let's say:
$content
contains the full text$look_for
contains the string I want to search for in that full text
I want this to be language agnostic, including languages like chinese that don't have space separators. The full text will be utf8. The following code is what I tried:
preg_match("/(.*){0,10}$look_for(.*){0,10}/i", $content, $matches);
With this the $matches
is empty. Isn't the {0,10}
supposed to be limiting the characters around the intended word?
CodePudding user response:
(.*){0,10}
is what is known as "catastrophic backtracking", use (.{10})
instead.
<?php
$content = 'aaaaaaaabbbbbbbbcccccccc the quick brown fox jumps over the lazy dog xxxxxxxxyyyyyyyyzzzzzzzz';
$look_for = 'the quick brown fox jumps over the lazy dog';
$regex = '/(.{10})' . $look_for . '(.{10})/i';
$matches = [];
preg_match($regex, $content, $matches);
var_dump($matches);
Results in:
array(3) {
[0]=>
string(63) "bcccccccc the quick brown fox jumps over the lazy dog xxxxxxxxy"
[1]=>
string(10) "bcccccccc "
[2]=>
string(10) " xxxxxxxxy"
}