I have a term, let's say the dog
and an example set of predefined tags: red
and big
. I'm trying to write a regexp that will match valid strings - those that have any combination of the tags where tag might be met zero or one time. Tags order does not matter.
Examples of strings that should match:
dog
red dog
red big dog
big red dog
Examples of strings that should not match:
red red dog
red big red dog
small red dog
The direct approach with just enumerating all possible combinations is a nightmare with dozens of terms.
This is where i've stopped for now:
/
(?: # group for repetition
(
red\s | big\s # a tag that ...
)(?! \1 ) # ... is not followed by itself
# > (replacing backref with a recusional backref
# > still doesn't work,
# > changing negative lookahead by a positive
# > still gives same undesired match on invalid strings)
){0,2} # such a term repeated 0 to [amount of terms] times
dog # followed by a 'dog'
/xs
This regexp matches all the strings, which is undesired.
CodePudding user response:
You may use this regex:
^(?!.*\b(big|red)\h.*\b\1\b)(?:big\h |red\h )*dog$
RegEx Details:
^
: Start^(?!.*\b(big|red)\h.*\b\1\b)
: Fail the match if any of the keywords appear more than once(?:big\h |red\h )*
: Match 0 or more ofbig
orred
words followed by 1 whitespacedog
: Matchdog
$
: End