Home > other >  Regexp: match enumeration of predefined tags
Regexp: match enumeration of predefined tags

Time:05-24

I have a term, let's say the dog and an example set of predefined tags: red and big. I'm trying to write a regexp that will match valid strings - those that have any combination of the tags where tag might be met zero or one time. Tags order does not matter.

Examples of strings that should match:

dog
red dog
red big dog
big red dog

Examples of strings that should not match:

red red dog
red big red dog
small red dog

The direct approach with just enumerating all possible combinations is a nightmare with dozens of terms.

This is where i've stopped for now:

/
    (?:                       # group for repetition
        (
            red\s | big\s     # a tag that ...
        )(?! \1 )             # ... is not followed by itself
                              # > (replacing backref with a recusional backref
                              # > still doesn't work, 
                              # > changing negative lookahead by a positive
                              # > still gives same undesired match on invalid strings)


    ){0,2}                    # such a term repeated 0 to [amount of terms] times
    dog                       # followed by a 'dog'
/xs

This regexp matches all the strings, which is undesired.

CodePudding user response:

You may use this regex:

^(?!.*\b(big|red)\h.*\b\1\b)(?:big\h |red\h )*dog$

RegEx Demo

RegEx Details:

  • ^: Start
  • ^(?!.*\b(big|red)\h.*\b\1\b): Fail the match if any of the keywords appear more than once
  • (?:big\h |red\h )*: Match 0 or more of big or red words followed by 1 whitespace
  • dog: Match dog
  • $: End
  • Related