Home > Net >  Combine negated characters and capture groups
Combine negated characters and capture groups

Time:04-19

How it works currently

I am able to capture the values between the brackets:

[[two b][three c]]

The result is

two b
three c

The RegEx for that

\[\[(. ?)\]\[(. ?)\]\]

When I use this string

[[one a]]

Nothing is captured and that is how I expect it. Fine.

The problem

I combine the strings

[[one a]] and [[two b][three c]]

This is captured

one a]] and [[two b
three c

What I understand

In my understanding there a possible approaches could be to negate the ]] string. But I don't know how to do this. And I am not sure if this is the right approach.

CodePudding user response:

The . char matches any char other than line break chars, and the fact it is quantified with a lazy quantifier does not restrict it from matching basically any char (the matches are searched for from left to right, thus, [[ matched is the leftmost [[ and the next ][ is matched regardless if there was a [ or ] in between.

So, one approach is to exclude any square brackets between [[ and ][ using a negated character class [^\]\[]:

\[\[([^\]\[] )\]\[([^\]\[] )\]\]

See the regex demo. Here, [^\]\[] that replaced . ? match one or more chars other than [ and ].

Another approach is the one you mention, namely, match any chars that do not start [[ (and probably ]], too) before ][:

\[\[((?:(?!\[\[).)*?)\]\[(.*?)\]\]
\[\[((?:(?!\[\[|\][\]\[]).)*)\]\[(.*?)\]\]

See this regex demo.

The (?:(?!\[\[).)*? part matches any char (.), zero or more but as few as possible occurrences (*?), that does not start a [[ char sequence ((?!\[\[)).

The (?:(?!\[\[|\][\]\[]).)* part matches any char (.), zero or more but as many as possible occurrences (*), that does not start a [[, [[ or ][ char sequences ((?!\[\[|\][\]\[])).

Depending on the regex flavor, you can get rid of some backslashes in this regex.

  • Related