Home > other >  Regex for matching thematic breaks in markdown
Regex for matching thematic breaks in markdown

Time:09-13

I'm trying to make a RegEx that matches all thematic breaks in a string for use in JavaScript's String.split function.

A thematic break can be:

  • Hyphens: ---
  • Asterisks: ***
  • Underscores: ___

Can have whitespace between the hyphens, asterisks or underscores, but you can't mix-n-match, for example this is not valid --*.

Full spec: https://spec.commonmark.org/0.30/#thematic-breaks

Here's what I've tried: /[-*_]{3,}/g but that does not match ones with whitespace in the middle, if I add a space there it will match stuff like -- which is not desirable. I also thought of first striping the whitespace but I'd like to fit it all into a RegEx.

Is this possible? And how?

CodePudding user response:

You can use this regex:

/^[ ]{0,3}([-*_])\s*\1\s*\1 \s*$/gm

Explanation:

^ - match start of line

[ ]{0,3} - match optional up to 3 spaces

([-*_]) - match either -, * or _ and put it in a group

\s*\1\s*\1 \s* - match optional white spaces and the character from the first group twice

$ - match end of line

Edit (from comment):

/^[ ]{0,3}([-*_])\s*(?:\1\s*){2,}$/gm

It now supports repeated patterns as long as the character used is the same.

This group:

(?:\1\s*) is repeated 2 or more times.

Examples of matches:

***
  - - -                   
 __            _
 ** * ** * ** * **

Examples of non matches:

*-_
abc
             

I should add that I use \s although the spec says 'space or tab'. Since this must be parsed line by line, \s should be safe.

You can test the regex here.

  • Related