Home > Back-end >  Regular expression to match strings for syntax highlighter
Regular expression to match strings for syntax highlighter

Time:04-06

I'm looking for a regular expression that matches strings for a syntax highlighter used in a code editor. I've found

(")(?:(?!\1|\\).|\\.)*\1

from here regex-grabbing-values-between-quotation-marks (I've changed the beginning since I only need double quotes, no single quotes)

The above regular expression correctly matches the following example having escaped double quotes and escaped backslashes

"this is \" just  a test\\"

Most code editors however also highlight open ended strings such as the following example

"this must \" match\\" this text must not be matched "this text must be matched as well

Is it possible to alter the above regular expression to also match the open ended string? Another possibility would be a second regular expression that just matches the open ended string such as

"[^"]*$ but match only if preceded by an even count of non-escaped quotes

CodePudding user response:

You could use an alternation to match either a backreference to group 1 or assert the end of the string with your current pattern.

(")(?:(?!\1|\\).|\\.)*(?:\1|$)

But as you are only capturing a single character (") you can omit the capture group and instead of the backreference \1 just match "

Alternatively written pattern:

"[^"\\]*(?:\\.[^"\\]*)*(?:"|$)

See a regex demo.


If the match should not start with \" and a lookbehind is supported:

(?<!\\)"[^"\\]*(?:\\.[^"\\]*)*(?:"|$)

This pattern matches:

  • (?<!\\) Negative lookbehind, assert not \ directly to the left
  • " Match the double quote
  • [^"\\]* Optionally match any char except " or \
  • (?:\\.[^"\\]*)* Optionally repeat matching \ and any char followed by any char except " or \
  • (?:"|$) Match either " or assert the end of the string.
  • Related