I'm trying to write a conditional Regex to achieve the following:
If the word "apple" or "orange" is present within a string:
there must be at least 2 occurrences of the word "HORSE" (upper-case)
else
there must be at least 1 occurrence of the word "HORSE" (upper-case)
What I wrote so far:
(?(?=((apple|orange).*))(HORSE.*){2}|(HORSE.*){1})
I was expecting this Regex to work as I'm following the pattern (?(?=regex)then|else)
.
However, it looks like (HORSE.*){1}
is always evaluated instead. Why?
https://regex101.com/r/V5s8hV/1
CodePudding user response:
The conditional is nice for checking a condition in one place and use outcome in another.
^(?=(?:.*?\b(apple|orange)\b)?)(.*?\bHORSE\b)(?(1)(?2))
- The condition is group one inside an optional
(?:
non capturing group)
- In the second group the part until
HORSE
which we always need gets matched (?(1)(?2))
conditional if first group succeeded, require group two pattern again
See this demo at regex101 (more explanation on the right side)
The way you planned it does work as well, but needs refactoring e.g. that regex101 demo.
^(?(?=.*?\b(?:apple|orange)\b)(?:.*?\bHORSE\b){2}|.*?\bHORSE\b)
Or another way without conditional and a negative lookahead like this demo at regex101.
^(?:(?!.*?\b(?:apple|orange)\b).*?\bHORSE\b|(?:.*?\bHORSE\b){2})
FYI: To get full string in the output, just attach .*
at the end. Further to mention, {1}
is redundant. Used a lazy quantifier (as few as possible) in the dot-parts of all variants for improving efficiency.
CodePudding user response:
I would keep it simple and use lookaheads to assert the number of occurrences of the word HORSE
:
^((?=.*\bHORSE\b.*\bHORSE\b).*\b(?:apple|orange)\b.*|(?=.*\bHORSE\b)(?!.*\b(?:apple|orange)\b).*)$
Demo
Explanation:
^
from the start of the string(
match either of(?=.*\bHORSE\b.*\bHORSE\b)
assert thatHORSE
appears at least twice.*
match any content\b(?:apple|orange)\b
matchapple
ororange
.*
match any content|
OR(?=.*\bHORSE\b)
assert thatHORSE
appears at least once(?!.*\b(?:apple|orange)\b)
butapple
andorange
do not occur.*
match any content
)
close alternation$
end of the string