Home > Back-end >  Regex - nested matches with multiple ending
Regex - nested matches with multiple ending

Time:02-14

this is my first question on stackoverflow so please bare with me here. Also I am not a native english speaker. I am presented with the following difficulties with my regular expression:

This is the string ("true" and/or "false" are not actually in the string but it helps with simplification):

**[if true]**
    [if true]
        [if false]
        [else]
        [/if]
    *[elseif false]*
        [if true]
        [/if]
    [else]
    [/if]
**[elseif false]**
[/if]
**[if false]**
    [if false]
    [else]
    *[/if]*
**[else]**
    [if true]
    [/if]
[/if]

I marked the wanted matches (**) and the ones i got (*)

In this situation I do only want to match the most outer parent [if XXXX].([else]|[elseif XXX]|[/if]) statement with its according end which can be [else], [elseif XXX] or [/if]. For now i do not care about the inner [if XXX] since when the parent is false i dont need to check for them.

When running my regex:

/\[if (.*?)\](((?R)|.)*?)(\[\/if\]|\[else\]|\[elseif )/gs 

it matches the parents [if XXX] and an incoherent combination of any [elseif XX], [else], [/if] in it.

As groups I do need the match > every X [if XXX] > the content between [if XXX] and the matching [END] as well as the [END].

Since i do not fully understand Recursion I´d appreciate your help. Many thanks in advance!

You can try the regex here: https://regex101.com/r/lnTh9M/1

CodePudding user response:

This might be close?
It also captures the outer closing tag.
But don't see how to avoid that, without breaking the recursion.

\[(if|elseif|else) ?(.*?)\](((?R)|[^\[\]])*?)(?:.(?=\[else.*?\])|\[\/if\])

Test on regex101 here

CodePudding user response:

When a pattern starts to be a little complicated, it's possible to use two features:

  • the verbose mode (x modifier)
  • references to subpatterns or better references to named subpatterns ( \g<name> )

Often with these two features things become more clear and the pattern is easier to build:

~
\[if \s  [^]]* ]

(?<content> [^[]*  (?: (?R) [^[]* )*  )
(?: \[elseif \s  [^]]* ] \g<content> )* 
(?: \[else] \g<content> )? 

\[/if]
~x

demo

Note that (?R) is nothing more that a reference to a subpattern except that this time the subpattern is the whole pattern.

  • Related