I have a list of string/regex that I want to check if its matched from the string input.
Lets just say I have these lists:
$list = [ // an array list of string/regex that i want to check
"lorem ipsum", // a words
"example", // another word
"/(nulla)/", // a regex
];
And the string:
$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";
And so, I want it to check like this:
if( $matched_string >= 1 ){ // check if there was more than 1 string matched or something...
// do something...
// output matched string: "lorem ipsum", "nulla"
}else{
// nothing matched
}
How can I do something like that?
CodePudding user response:
I'm not sure if this approach would work for your case but, you could treat them all like regexes.
$list = [ // an array list of string/regex that i want to check
"lorem ipsum", // a words
"Donec mattis",
"example", // another word
"/(nulla)/", // a regex
"/lorem/i"
];
$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";
$is_regex = '/^\/.*\/[igm]*$/';
$list_matches = [];
foreach($list as $str){
// create a regex from the string if it isn't already
$patt = (preg_match($is_regex, $str))? $str: "/$str/";
$item_matches = [];
preg_match($patt, $input_string, $item_matches);
if(!empty($item_matches)){
// only add to the list if matches
$list_matches[$str] = $item_matches;
}
}
if(empty($list_matches)){
echo 'No matches from the list found';
}else{
var_export($list_matches);
}
The above will output the following:
array (
'Donec mattis' =>
array (
0 => 'Donec mattis',
),
'/(nulla)/' =>
array (
0 => 'nulla',
1 => 'nulla',
),
'/lorem/i' =>
array (
0 => 'Lorem',
),
)
CodePudding user response:
Try the following:
<?php
$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";
$list = [ // an array list of string/regex that i want to check
"Lorem ipsum", // a words
"consectetur", // another word
"/(nu[a-z]{2}a)/", // a regex
];
$regex_list = [];
foreach($list as $line) {
if ($line[0] == '/' and $line[-1] == '/')
$regex_list[] = substr($line, 1, -1);
else
$regex_list[] = preg_quote($line);
}
$regex = '/' . implode('|', $regex_list) . '/';
echo $regex, "\n";
preg_match_all($regex, $input_string, $matches, PREG_SET_ORDER);
print_r($matches);
Prints:
/Lorem ipsum|consectetur|(nu[a-z]{2}a)/
Array
(
[0] => Array
(
[0] => Lorem ipsum
)
[1] => Array
(
[0] => consectetur
)
[2] => Array
(
[0] => nulla
[1] => nulla
)
)
Discussion and Limitations
In processing each element of $list
, if the string begins and ends with '/', it is assumed to be a regular expression and the '/' characters are removed from the start and end of the string. Otherwise it is a "regular" string and is replaced by the results of calling preg_quote
on it to escape special characters that have meaning in regular exressions. Finally, all the strings are joined together with the regular expression or character, '|', and then prepended and appended with '/' characters to create a single regular expression from the input.
The main limitation is that this does not handle backreferences correctly if multiple regular expressions in the input list have capture groups, since the group numberings will be modified when the regular expressions are combined.
CodePudding user response:
Typically, I scream bloody murder if someone dares to stink up their code with error suppressors. If your input data is so out-of-your-control that you are allowing a mix of regex an non-regex input strings, then I guess you'll probably condone @
in your code as well.
Validate the search string to be regex or not as demonstrated here. If it is not a valid regex, then wrap it in delimiters and call preg_quote()
to form a valid regex pattern before passing it to the actual haystack string.
Code: (Demo)
$list = [ // an array list of string/regex that i want to check
"lorem ipsum", // a words
"example", // another word
"/(nulla)/", // a valid regex
"/[,.]/", // a valid regex
"^dolor^", // a valid regex
"/path/to/dir/", // not a valid regex
"[integer]i", // valid regex not implementing a character class
];
$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, /path/to/dir/ nulla ac suscipit maximus, leo metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";
$result = [];
foreach($list as $v) {
if (@preg_match($v, '') === false) {
// not a regex, make into one
$v = '/' . preg_quote($v, '/') . '/';
}
preg_match($v, $input_string, $m);
$result[$v] = $m[0] ?? null;
}
var_export($result);
Or you could write the same thing this way, but I don't know if there is any drag in performance by checking the pattern against a non-empty string: (Demo)
$result = [];
foreach($list as $v) {
if (@preg_match($v, $input_string, $m) === false) {
preg_match('/' . preg_quote($v, '/') . '/', $input_string, $m);
}
$result[$v] = $m[0] ?? null;
}
var_export($result);