How it works currently
I am able to capture the values between the brackets:
[[two b][three c]]
The result is
two b
three c
The RegEx for that
\[\[(. ?)\]\[(. ?)\]\]
When I use this string
[[one a]]
Nothing is captured and that is how I expect it. Fine.
The problem
I combine the strings
[[one a]] and [[two b][three c]]
This is captured
one a]] and [[two b
three c
What I understand
In my understanding there a possible approaches could be to negate the ]]
string. But I don't know how to do this. And I am not sure if this is the right approach.
CodePudding user response:
The .
char matches any char other than line break chars, and the fact it is quantified with a lazy quantifier does not restrict it from matching basically any char (the matches are searched for from left to right, thus, [[
matched is the leftmost [[
and the next ][
is matched regardless if there was a [
or ]
in between.
So, one approach is to exclude any square brackets between [[
and ][
using a negated character class [^\]\[]
:
\[\[([^\]\[] )\]\[([^\]\[] )\]\]
See the regex demo. Here, [^\]\[]
that replaced . ?
match one or more chars other than [
and ]
.
Another approach is the one you mention, namely, match any chars that do not start [[
(and probably ]]
, too) before ][
:
\[\[((?:(?!\[\[).)*?)\]\[(.*?)\]\]
\[\[((?:(?!\[\[|\][\]\[]).)*)\]\[(.*?)\]\]
See this regex demo.
The (?:(?!\[\[).)*?
part matches any char (.
), zero or more but as few as possible occurrences (*?
), that does not start a [[
char sequence ((?!\[\[)
).
The (?:(?!\[\[|\][\]\[]).)*
part matches any char (.
), zero or more but as many as possible occurrences (*
), that does not start a [[
, [[
or ][
char sequences ((?!\[\[|\][\]\[])
).
Depending on the regex flavor, you can get rid of some backslashes in this regex.