Home > Blockchain >  Decomposing a string into words separared by spaces, ignoring spaces within quoted strings, and cons
Decomposing a string into words separared by spaces, ignoring spaces within quoted strings, and cons

Time:05-26

How can I explode the following string:

 test  word any -sample ( toto  titi "generic test") -column:"test this" ( data id:1234)

into

Array(' test', ' word', 'any', '-sample', '(', ' toto', ' titi', '"generic test"', ')', '-column:"test this"', '(', ' data', 'id:1234', ')')

I would like to extend the boolean fulltext search SQL query, adding the feature to specify specific columns using the notation column:value or column:"valueA value B".

How can I do this using preg_match_all($regexp, $query, $result), i.e., what is the correct regular expression to use?

Or more generally, what would be the most appropriate regular expression to decompose a string into words not containing spaces, where spaces within text between quotes is not considered spaces, for the sake of defining a word, and ( and ) are considered words, independent of being surrounded by spaces. For example xxx"yyy zzz" should be considered a single world. And (aaa) should be three words (, aaa and ).

I have tried something like /"(?:\\\\.|[^\\\\"])*"|\S /, but with limited/no success.

Can anybody help?

CodePudding user response:

I think PCRE verbs can be used to achieve your goal:

preg_split('/".*?"(*SKIP)(*FAIL)|(\(|\))| /', ' test  word any -sampe ( toto  titi "generic test") -column:"test this" ( data id:1234)',-1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY)

https://3v4l.org/QnpB9
https://regex101.com/r/pw1mEd/1
https://3v4l.org/dNMkf (with test data)

CodePudding user response:

If you want to match the various parts using alternations:

(?:[^\s()":]*:)?"[^"] "|[^\s()] |[()]

Explanation

  • (?: Non capture group to match as a whole part
    • [^\s()":]*: Match optional non whitespace chars other than ( ) " : and then match :
  • )? Close the non capture group and make it optional
  • "[^"] " Match from an opening double quote till closing double quote
  • | Or
  • [^\s()] Match 1 non whitespace chars other than ( or )
  • | Or
  • [()] Match either ( or )

Regex demo | PHP demo

Example code

$re = '/(?:[^\s()":]*:)?"[^"] "|[^\s()] |[()]/m';
$str = ' test  word any -sampe ( toto  titi "generic test") -column:"test this" ( data id:1234)';
preg_match_all($re, $str, $matches);
print_r($matches[0]);

Output

Array
(
    [0] =>  test
    [1] =>  word
    [2] => any
    [3] => -sampe
    [4] => (
    [5] =>  toto
    [6] =>  titi
    [7] => "generic test"
    [8] => )
    [9] => -column:"test this"
    [10] => (
    [11] =>  data
    [12] => id:1234
    [13] => )
)
  • Related