I have the following string
huile contains rgbgrbrb9gr && huile contains fcecec
I use this regex in order to capture a block of condition:
(. ) (contains) (. )
It works with one block "huile contains rgbgrrb9gr" but if i add another condition with && or || operator, the two operators are captured. What i'm expecting to capture if the two blocks excluding && and || operator.
Can someone have any idea how to achieve this?
CodePudding user response:
Regex normally matches the longest input it finds.
You need to exclude &
and |
from your input, like this:
([^&|] ) (contains) ([^&|] )
If you instead desire to exclude double-character &&
and ||
, I suggest spliting your string based on those delimiters first, then matching using regex, as complex parsing is really beyond the realm of regex (they're grammars actually).
But, a regex solution is nontheless possible
The rough idea is that, you want to match a string with
- an optional prefix consisting of no
&
or|
- a single
&
or|
followed by a non-empting string - repeating 2 for non-zero number of times.
the subpattern would be something like this:
(([^&|] )?([&|][^&|] ) )
additionally, you'll want something like the egrep
's x
flag, to match the entire string, otherwise it'll be possible that an empty string turns up.
The full regex would look something like this (capture groups're re-numbered)
(([^&|] )?([&|][^&|] ) ) (contains) (([^&|] )?([&|][^&|] ) )
CodePudding user response:
What about using:
(.*?) contains (.*?)\s*(?:([|&])\3|$)\s*
See the online demo
(.*?)
- 1st Capture group to catch whatever comes before 'contains' (lazy).contains
- Literally ' contains ', with leading and trailing space char.(.*?)
- 2nd Capture group to catch whatever comes after'contains' (lazy).\s*
- 0 Space chars.(?:([|&])\3|$)
- A non-capture group with an alternation inside:([|&])\3
- Either a double pipe-symbol or ampersand;$
- Or the end-string anchor.
\s*
- 0 Space chars.
Your substring will be captured in both 1st and 2nd capture group. And if you really want to capture 'contains' to then it's an easy fix inside the pattern.
CodePudding user response:
After reading the post comments, the desired result was more clear.
This one could work too:
(?<=^|(?:&&|\|\|) )(. ?) (contains) (. ?)(?= (?:&&|\|\|)|$)
https://regex101.com/r/YDFpN9/2
CodePudding user response:
If you want 3 capture groups, you could match what you don't want first, and then capture in groups what you want to keep making use of a tempered greedy token approach to not cross matching &&
or ||
or the word contains
.
\|{2,}|&{2,}|((?:(?!&&|\|\||\bcontains\b).)*) (contains) ((?:(?!&&|\|\||\bcontains\b).)*)
The pattern matches:
\|{2,}|&{2,}
Match either 2 or more pipe chars or ampersands (what you don't want to keep)|
Or(
Capture group 1(?:(?!&&|\|\||\bcontains\b).)*
Match any char except a newline if what is directly to the right is not&&
||
or contains
)
Close group 1(contains)
Match the word contains in group 2 between spaces(
Capture group 3(?:(?!&&|\|\||\bcontains\b).)*
Same approach as above
)
Close group 3