Home > Enterprise >  Regex to replace Markdown characters not inside backticks
Regex to replace Markdown characters not inside backticks

Time:10-05

I'm trying to write a regex that replaces any markdown specific character (to be escaped, [\*_{}[\]()# \-.!`]) not inside single or triple backticks. This is being implemented in Javascript. A couple examples:

  1. foo `bar` baz.qux `quux` -> should replace . in baz.qux with \\.&

  2. foo `bar.baz` foo_bar -> should replace _ with \\_& in foo_bar, but not . in bar.baz

This is what I have right now: markdown.replace(/[\\*_{}[\]()# \-.!`]/g, '\\$&'), but it matches . in `foo.bar`.

Thanks in advance for the help!

CodePudding user response:

You can use

markdown.replace(/(`(?:``)?).*?\1|[\\*_{}[\]()# \-.!`]/g, 
    (x, y) => y ? y : `\\${x}`)

See the JavaScript demo:

const markdown = 'foo `bar` baz.qux `quux` and foo `bar.baz` foo_bar';
console.log(
    markdown.replace(/(`(?:``)?).*?\1|[\\*_{}[\]()# \-.!`]/g, 
        (x, y) => y ? x : `\\${x}`)
)

Details:

  • (`(?:``)?).*?\1 - a single or triple backticks (captured into Group 1, \1), then any zero or more chars other than line break chars, as few as possible, and then the same amount of backticks as captured into Group 1
  • | - or
  • [\\*_{}[\]()# \-.!`] - a char from the set, \, *, _, {, }, [, ], (, ), #, , -, ., ! or backtick.

The replacement is a callable, where Group 1 is defined as y, and x is the whole match value. If Group 1 matched (see y ?), the replacement is the whole match (the substrings between backticks), else, the return value is a backslash the matched char.

  • Related