Home > Enterprise >  Regex to find Markdown headings that don't end with anchors
Regex to find Markdown headings that don't end with anchors

Time:07-07

I am trying to search through Markdown files in VS Code looking for headings that do not have anchors at the end their lines like this [#some-anchor-name]. To clarify, here is the shape of the headings I'm looking for:

  • Headings that start with 1 to 4 # symbols (for example # or ###).
  • Headings that have any number of characters following the # symbols, such as ## My Big Heading
  • Headings that do not end with the typical anchor pattern [#some-anchor-name]

Here are some regex I've tried:

This one almost works but it expects a literal space at the end of the heading with the missing anchor, which won't always be the case:

^#{1,4}.*\s(?!\[#.*\])$

The regex above matches on ## My Big Heading (note the space after the heading) which made me think I was going in the right direction.

I tried removing the search for the literal space just prior to the anchor and it matches on all my headings--even ones with anchors:

^#{1,4}.*(?!\[#.*\])$

For example, the regex above matches on ## My Big Heading and ## My Big Heading [#my-big-anchor]

To summarize, I'd like my regex to find line #2 below:

## My Big Heading [#my-big-anchor]
## My Big Heading

I looked at a variety of discussions on matching strings that don't have a particular pattern, but since I'm not matching a particular word at the end of the headings, they don't seem to apply:

CodePudding user response:

With your current pattern, the .*\s first matches until the end of the string, and then backtracks until the first occurrence of a whitespace char and then asserts that [#...] is not directly to the right.

While that assertion is true for the space in between Big Heading, the $ anchor right after it can not match.


You could write the pattern with the end of the string in the lookahead assertion:

^#{1,4}\s(?!.*\[#.*\]$).*

Explanation

  • ^ Start of string
  • #{1,4} Match 1-4 times a # char
  • \s Match a whitespace char
  • (?!.*\[#.*\]$) Negative lookahead, assert from the current position that the string does not end with [#...]
  • .* Match the rest of the line

Regex demo

CodePudding user response:

What you should avoid is using .* (zero or more sequences of any character except a new line) and use instead [^\[]* (zero or more sequences of any character except an opening square bracket).

This is because the .* pattern does not play nice with your negative look-ahead.

As long as your normal headings does not have an opening square bracket character, you can use the following simple regex: ^#{1,4}[^\[] $. It does not use negative look-ahead assertion pattern.

  • Related