I have a list of keywords for example: key1
, key2
, key3
, key4
...
And I have a string, includes free strings and key:value pairs. I need to determine it valid or invalid. Here are the rules:
- If the string NOT contains keywords => VALID
- The
key
in pairkey:value
should be belong to the keywords list. - The
value
in pairkey:value
shold not empty - There is no space before and after colon in pair
key:value
- Do not accept any
key:value
that is not in the keyword list
Example:
aaa bbb ccc
=> VALIDaaa bbb key1:aaa ccc ddd
=> VALIDaaa bbb key2:aaa key1:bbb
=> VALIDaaa bbb key1 ccc
=> INVALID (key1
not follows the rulekey:value
)aaa bbb key1:xxx key2:
=> INVALID (key2
not follows the rulekey:value
)aaa bbb key1: xxx ccc
=> INVALID (not accept space after colon)aaa bbb key1 :xxx ccc
=> INVALID (not accept space before colon)aaa bbb aakey1
=> VALID (aakey1
is not the keyword)aaa bbb key1aa
=> VALID (key1aa
is not the keyword)aaa bbb xxx:yyy zzz
=> INVALID (Do not accept anykey:value
that thekey
is not in the keyword list)
I have tried /^(?: ?(?:\b(?:key1|key2)(?:: *\S |(*SKIP)(*FAIL))|[^: ] ))*$/
. It was good but didn't cover case aaa bbb key1aa
. It got invalid instead of valid.
ALSO, how can I take the invalid keywords? Because I need to show error message to frontend.
Example: for aaa bbb key1 ccc
, I need to take the error key here is key1
. So that, I can show error message to frontend such as: Key1 is invalid format, please check it again...
.
Please help me take a look.
CodePudding user response:
Your regex can look like
^((key1|key2|key3):\w |\b(?!(?2)\b)\w )(?:\h (?1))*$
^((key1|key2|key3):\w |(?!(?2)\b)\w )(?:\s (?1))*$
See the regex demo. Details:
^
- start of string((key1|key2|key3):\w |(?!(?2)\b)\w )
- Group 1:(key1|key2|key3):\w
- a word that is a valid key captured into Group 1,:
, one or more word chars|
- or(?!(?2)\b)\w
- a word that is not a key
(?:\h (?1))*
- zero or more repetitions of\h
- one or more horizontal whitespaces(?1)
- Group 1 pattern recursed/repeated
$
- end of string
To collect keys that are not on your list, you can use
\b(?!(?:key1|key2|key3):)\w (?=:\b)
See this regex demo. Details:
\b
- a word boundary(?!(?:key1|key2):)
- a negative lookahead that fails the match if there arekey1
orkey2
(etc.) followed with:
\w
- one or more word chars(?=:\b)
- immediately to the right, there must be a:
followed with a word char.
CodePudding user response:
Looking at the example data, you want to show the wrong keys. In that case, you can match the wrong keys, and if there is no wrong key the string would be valid.
The pattern to detect wrong keys:
\b(?:key1|key2)\b:?(?:\h |$)|\b(?!(?:key1|key2)\b)[^\s:] :
Explanation
\b(?:key1|key2)\b
Match key1 or key2 between word boundaries:?(?:\h |$)
Match optional:
and either 1 spaces or end of string|
Or\b
A word boundary(?!(?:key1|key2)\b)
Negative lookahead, assert not key1 or key2 to the right[^\s:] :
Match 1 non whitespace chars other than:
and then match:
Example code:
$keywords = ["key1", "key2"];
$escaped = implode('|', array_map('preg_quote', $keywords));
$pattern = "/\b(?:" . $escaped . ")\b" . ":?(?:\h |$)|\b(?!(?:" . $escaped . ")\b)[^\s:] :/";
$strings = [
"aaa bbb key1 ccc",
"aaa bbb key1:xxx key2:",
"aaa bbb key1: xxx ccc",
"aaa bbb key1 :xxx ccc",
"aaa bbb xxx:yyy zzz",
"aaa bbb ccc",
"aaa bbb key1:aaa ccc ddd",
"aaa bbb key2:aaa key1:bbb",
"aaa bbb aakey1",
"aaa bbb key1aa",
"aaa bbb key1aa"
];
foreach ($strings as $s) {
preg_match_all($pattern, $s, $matches);
if (count($matches[0]) > 0) {
foreach ($matches[0] as $m) {
echo "Invalid string -->'$m' is invalid format, please check it again..." . PHP_EOL;
}
} else {
echo "Valid string --> '$s'" . PHP_EOL;
}
}
Output
Invalid string -->'key1 ' is invalid format, please check it again...
Invalid string -->'key2:' is invalid format, please check it again...
Invalid string -->'key1: ' is invalid format, please check it again...
Invalid string -->'key1 ' is invalid format, please check it again...
Invalid string -->'xxx:' is invalid format, please check it again...
Valid string --> 'aaa bbb ccc'
Valid string --> 'aaa bbb key1:aaa ccc ddd'
Valid string --> 'aaa bbb key2:aaa key1:bbb'
Valid string --> 'aaa bbb aakey1'
Valid string --> 'aaa bbb key1aa'
Valid string --> 'aaa bbb key1aa'