How can I explode the following string:
test word any -sample ( toto titi "generic test") -column:"test this" ( data id:1234)
into
Array(' test', ' word', 'any', '-sample', '(', ' toto', ' titi', '"generic test"', ')', '-column:"test this"', '(', ' data', 'id:1234', ')')
I would like to extend the boolean fulltext search SQL query, adding the feature to specify specific columns using the notation column:value
or column:"valueA value B"
.
How can I do this using preg_match_all($regexp, $query, $result)
, i.e., what is the correct regular expression to use?
Or more generally, what would be the most appropriate regular expression to decompose a string into words not containing spaces, where spaces within text between quotes is not considered spaces, for the sake of defining a word, and (
and )
are considered words, independent of being surrounded by spaces. For example xxx"yyy zzz"
should be considered a single world. And (aaa)
should be three words (
, aaa
and )
.
I have tried something like /"(?:\\\\.|[^\\\\"])*"|\S /
, but with limited/no success.
Can anybody help?
CodePudding user response:
I think PCRE verbs can be used to achieve your goal:
preg_split('/".*?"(*SKIP)(*FAIL)|(\(|\))| /', ' test word any -sampe ( toto titi "generic test") -column:"test this" ( data id:1234)',-1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY)
https://3v4l.org/QnpB9
https://regex101.com/r/pw1mEd/1
https://3v4l.org/dNMkf (with test data)
CodePudding user response:
If you want to match the various parts using alternations:
(?:[^\s()":]*:)?"[^"] "|[^\s()] |[()]
Explanation
(?:
Non capture group to match as a whole part[^\s()":]*:
Match optional non whitespace chars other than(
)
"
:
and then match:
)?
Close the non capture group and make it optional"[^"] "
Match from an opening double quote till closing double quote|
Or[^\s()]
Match 1 non whitespace chars other than(
or)
|
Or[()]
Match either(
or)
Example code
$re = '/(?:[^\s()":]*:)?"[^"] "|[^\s()] |[()]/m';
$str = ' test word any -sampe ( toto titi "generic test") -column:"test this" ( data id:1234)';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Output
Array
(
[0] => test
[1] => word
[2] => any
[3] => -sampe
[4] => (
[5] => toto
[6] => titi
[7] => "generic test"
[8] => )
[9] => -column:"test this"
[10] => (
[11] => data
[12] => id:1234
[13] => )
)