Home > Enterprise >  Match [tag] and {{tag}} using alternate group and lookahead
Match [tag] and {{tag}} using alternate group and lookahead

Time:06-26

I am aware this can easily be matched with /(\[tag\]|{{tag}})/ but that is not what I'm looking for. This is a beginning of a longer regex I'm trying to write (attribute parsing, etc.)

My criteria is:

  1. It has to be a Javascript regex (no PCRE).
  2. The part in between the wrappers can only be defined once.

I have the following regex working, but I am not comfortable with the regex not doing exactly as how you read it:

For the purposes of readability I am going to pretend that Javascript has the "X" flag here. I know that it doesn't.

(?:(\[)|{{) // Group with alternatives but only capturing group 1 so we can look around
(tag)           // Capture the part between the wrappers
(?:             // I don't care about capturing the closing wrapper, just match it
  (?=\1)}}   // Closing condition A (works, but why?)
  |             // OR
  (?!\1)]    // Closing condition B (works, but why?)
)

As you can see it works here:

let input = `

[tag][ <--
[tag[
[tag}}
{{tag]
{{tag}} <--
tag}}
{{tag
[tag
tag]

`;

input = input.replace(/(?:(\[)|{{)(tag)((?=\1)}}|(?!\1)])/g, "@@@@@");

console.log(input);

If I try to eye-parse the regex, shouldn't it be that closing condition A need to match ] when \k<s> is true?

However, if I do that I end up matching the wrong things and only the above configuration works.

Any regex gurus want to illuminate me?

Thanks, voldomazta

CodePudding user response:

The first match will use closing condition B because the next char is ], which isn't \1.

The second match will use closing condition A because group 1 is undefined, which will satisfy the look ahead.

Since javascript regex doesn't support conditionals (like C# does), I would go for the pattern you have in first line. However, to only have the tag part once, I would write tag regex part in a variable, then use the constructor that takes a string, now it's easy to add the same string twice and you only have one regex to maintain.

  • Related