I'm trying to make a RegEx that matches all thematic breaks in a string for use in JavaScript's String.split
function.
A thematic break can be:
- Hyphens:
---
- Asterisks:
***
- Underscores:
___
Can have whitespace between the hyphens, asterisks or underscores, but you can't mix-n-match, for example this is not valid --*
.
Full spec: https://spec.commonmark.org/0.30/#thematic-breaks
Here's what I've tried: /[-*_]{3,}/g
but that does not match ones with whitespace in the middle, if I add a space there it will match stuff like --
which is not desirable. I also thought of first striping the whitespace but I'd like to fit it all into a RegEx.
Is this possible? And how?
CodePudding user response:
You can use this regex:
/^[ ]{0,3}([-*_])\s*\1\s*\1 \s*$/gm
Explanation:
^
- match start of line
[ ]{0,3}
- match optional up to 3 spaces
([-*_])
- match either -
, *
or _
and put it in a group
\s*\1\s*\1 \s*
- match optional white spaces and the character from the first group twice
$
- match end of line
Edit (from comment):
/^[ ]{0,3}([-*_])\s*(?:\1\s*){2,}$/gm
It now supports repeated patterns as long as the character used is the same.
This group:
(?:\1\s*)
is repeated 2 or more times.
Examples of matches:
***
- - -
__ _
** * ** * ** * **
Examples of non matches:
*-_
abc
I should add that I use \s
although the spec says 'space or tab'. Since this must be parsed line by line, \s
should be safe.
You can test the regex here.