I am trying to search through Markdown files in VS Code looking for headings that do not have anchors at the end their lines
like this [#some-anchor-name]
. To clarify, here is the shape of the headings I'm looking for:
- Headings that start with 1 to 4
#
symbols (for example # or ###). - Headings that have any number of characters following the
#
symbols, such as## My Big Heading
- Headings that do not end with the typical anchor pattern
[#some-anchor-name]
Here are some regex I've tried:
This one almost works but it expects a literal space at the end of the heading with the missing anchor, which won't always be the case:
^#{1,4}.*\s(?!\[#.*\])$
The regex above matches on ## My Big Heading
(note the space after the heading) which made me think I was going in the right direction.
I tried removing the search for the literal space just prior to the anchor and it matches on all my headings--even ones with anchors:
^#{1,4}.*(?!\[#.*\])$
For example, the regex above matches on ## My Big Heading
and ## My Big Heading [#my-big-anchor]
To summarize, I'd like my regex to find line #2 below:
## My Big Heading [#my-big-anchor]
## My Big Heading
I looked at a variety of discussions on matching strings that don't have a particular pattern, but since I'm not matching a particular word at the end of the headings, they don't seem to apply:
- Regular expression to match a line that doesn't contain a word
- https://superuser.com/questions/1279062/regex-matching-line-not-containing-the-string
CodePudding user response:
With your current pattern, the .*\s
first matches until the end of the string, and then backtracks until the first occurrence of a whitespace char and then asserts that [#...]
is not directly to the right.
While that assertion is true for the space in between Big Heading
, the $
anchor right after it can not match.
You could write the pattern with the end of the string in the lookahead assertion:
^#{1,4}\s(?!.*\[#.*\]$).*
Explanation
^
Start of string#{1,4}
Match 1-4 times a#
char\s
Match a whitespace char(?!.*\[#.*\]$)
Negative lookahead, assert from the current position that the string does not end with[#...]
.*
Match the rest of the line
CodePudding user response:
What you should avoid is using .*
(zero or more sequences of any character except a new line) and use instead [^\[]*
(zero or more sequences of any character except an opening square bracket).
This is because the .* pattern does not play nice with your negative look-ahead.
As long as your normal headings does not have an opening square bracket character, you can use the following simple regex: ^#{1,4}[^\[] $
. It does not use negative look-ahead assertion pattern.