Home > front end >  Conditional Regex not working as expected
Conditional Regex not working as expected

Time:11-02

I'm trying to write a conditional Regex to achieve the following:

If the word "apple" or "orange" is present within a string:
   there must be at least 2 occurrences of the word "HORSE" (upper-case)
else 
   there must be at least 1 occurrence of the word "HORSE" (upper-case)

What I wrote so far:

(?(?=((apple|orange).*))(HORSE.*){2}|(HORSE.*){1})

I was expecting this Regex to work as I'm following the pattern (?(?=regex)then|else).

However, it looks like (HORSE.*){1} is always evaluated instead. Why?

https://regex101.com/r/V5s8hV/1

CodePudding user response:

The conditional is nice for checking a condition in one place and use outcome in another.

^(?=(?:.*?\b(apple|orange)\b)?)(.*?\bHORSE\b)(?(1)(?2))
  • The condition is group one inside an optional (?: non capturing group )
  • In the second group the part until HORSE which we always need gets matched
  • (?(1)(?2)) conditional if first group succeeded, require group two pattern again

See this demo at regex101 (more explanation on the right side)


The way you planned it does work as well, but needs refactoring e.g. that regex101 demo.

^(?(?=.*?\b(?:apple|orange)\b)(?:.*?\bHORSE\b){2}|.*?\bHORSE\b)

Or another way without conditional and a negative lookahead like this demo at regex101.

^(?:(?!.*?\b(?:apple|orange)\b).*?\bHORSE\b|(?:.*?\bHORSE\b){2})

FYI: To get full string in the output, just attach .* at the end. Further to mention, {1} is redundant. Used a lazy quantifier (as few as possible) in the dot-parts of all variants for improving efficiency.

CodePudding user response:

I would keep it simple and use lookaheads to assert the number of occurrences of the word HORSE:

^((?=.*\bHORSE\b.*\bHORSE\b).*\b(?:apple|orange)\b.*|(?=.*\bHORSE\b)(?!.*\b(?:apple|orange)\b).*)$

Demo

Explanation:

  • ^ from the start of the string
  • ( match either of
    • (?=.*\bHORSE\b.*\bHORSE\b) assert that HORSE appears at least twice
    • .* match any content
    • \b(?:apple|orange)\b match apple or orange
    • .* match any content
    • | OR
    • (?=.*\bHORSE\b) assert that HORSE appears at least once
    • (?!.*\b(?:apple|orange)\b) but apple and orange do not occur
    • .* match any content
  • ) close alternation
  • $ end of the string
  • Related