Home > Back-end >  PCRE Regex - Match only brackets excluding enclosed content
PCRE Regex - Match only brackets excluding enclosed content

Time:10-20

I'm trying to match a pair of special characters, while excluding the enclosed content from the match. For example, ~some enclosed content~ should match only the pair of ~ and leave out some enclosed content entirely. I can only use vanilla PCRE, and capture groups aren't an option.

My match criteria for the entire string is ~([^\s].*?(?<!\s))~. Matching the first and second ~ separately would also be acceptable.

CodePudding user response:

Looking at your pattern, you want a non whitespace char right after the opening ~ and a non whitespace char right before the closing ~

As those are the delimiters, and the non whitespace char should also not be ~ itself, you might use:

~(?=[^~\s](?:[^~\r\n]*[^\s~])?~)|(?<=~)[^\s~](?:[^~\r\n]*[^\s~])?\K~

Explanation

  • ~ Match literally
  • (?= Positive lookahead, assert that to the right is
    • [^~\s] Match a non whitespace char except for ~
    • (?: Non capture group
      • [^~\r\n]*[^\s~] Match repeating any char other than a newline or ~ followed by a non whitespace char except for ~
    • )? Close non capture group and make it optional (to also match a single char ~a~)
    • ~ Match literally
  • ) Close the lookahead
  • | Or
  • (?<=~) Positive lookbehind, assert ~ to the left
  • [^\s~] Match a non whitespace char except for ~
  • (?:[^~\r\n]*[^\s~])? Same optional pattern as in the lookahead
  • \K Forget what is matched so far
  • ~ Match literally

Regex demo

  • Related