Home > Software design >  PCRE Regex - Backreference not working inside lookahead or after pipe
PCRE Regex - Backreference not working inside lookahead or after pipe

Time:10-24

My regex query is the following (demo):

(?'a'[~_])(?=(?!\s)(?:(?!\k'a').) (?<!\s)\k'a')|(?=(?=(?'b'[\s\S]*))(?'c'\k'a'(?!\s)(?:(?!\k'a').) (?<!\s)(?=\k'b'\z)|(?<=(?=x^|(?&c))[\s\S])))\k'a'

The problem I'm facing is that backreferences to the named capture group (?'a'~_) fail to match in the part of the query on the right side of the main pipe:

(?=(?=(?'b'[\s\S]*))(?'c'\k'a'(?!\s)(?:(?!\k'a').) (?<!\s)(?=\k'b'\z)|(?<=(?=x^|(?&c))[\s\S])))\k'a'

They do however work on the part to the left of the pipe:

(?'a'[~_])(?=(?!\s)(?:(?!\k'a').) (?<!\s)\k'a')

The purpose of the query is to match only the surrounding delimiters of strings such as ~test~ or _test_, with a few additional criteria, which it does by first matching the opening delimiter with a lookahead (demo), and then using a variable length lookbehind to match the closing delimiter (demo with literals instead of backreferences).

While I am aware the query could be wildly simplified using \K or capture groups, neither are an option for me.

CodePudding user response:

Your regex is great. You can just correct it a little.

(?'a'[~_])(?=
   (?'d'(?!\s)(?:(?!\k'a').) (?<!\s)\k'a') |
   (?=(?'b'.*))(?'c'
      ^(?>\k'a'(?&d)|.)*\k'a'(?&d)(?=\k'b'\z) |
      (?<=(?=x^|(?&c)).)
   )
)

Demo

But I think that the performance of such a regex will be low.

  • Related