Home > database >  Find :: outside of markdown code formatting
Find :: outside of markdown code formatting

Time:09-20

I have a bunch of markdown files, where I want to search for Ruby's double colon :: outside of some code formatting (e.g. where I forgot to apply proper markdown). For example

`foo::bar`

hello `foo::bar` test
`  example::with::whitespace  `

```
Proper::Formatted
```

```
  Module::WithIndendation
```

```
Some::Nested::Modules
```

```ruby
CodeBlock::WithSyntax
```

# Some::Class

## Another::Class Heading
some text

The regex only should match Some::Class and Another::Class, because they miss the surrounding backticks, and are also not within a multiline code fence block.

I have this regex, but it also matches the multi line block

[\s] [^`] (::)[^`] [\s]?

Any idea, how to exclude this?

EDIT: It would be great, if the regex would work in Ruby, JS and on the command line for grep.

CodePudding user response:

For the original input, you may use this regex in ruby to match :: string

  1. not preceded by a ` and

  2. not preceded by ` followed a white-space:

Regex:

(?<!`\s)(?<!`)\b\w ::\w 

RegEx Demo 1

RegEx Breakup:

  • (?<!\s): Negative lookbehind to assert that <code> and whitespace is not at preceding position
  • (?<!): Negative lookbehind to assert that <code> is not at preceding position
  • \b: Match word boundary
  • \w : Match 1 word characters
  • ::: Match a ::
  • \w : Match 1 word characters

You can use this regex in Javascript:

(?<!`\w*\s*|::)\b\w (?:::\w ) 

RegEx Demo 2


For gnu-grep, consider this command:

grep -ZzoP '`\w*\s*\b\w ::\w (*SKIP)(*F)|\b\w ::\w ' file |
xargs -0 printf '%s\n'

Some::Class
Another::Class

RegEx Demo 3

  • Related