Home > Software design >  Regex removing bold markdown from inside codeblock only
Regex removing bold markdown from inside codeblock only

Time:08-22

I'm editing in bulk some markdown files to be compliant with mkdocs syntax (material theme).

My previous documentation software accepted bold inside codeblock, but I discover now it's far from standard.

I've more than 10k codeblocks in this documentation, with more than 300 md files in nested directories, and most of them has ** in order to bold some word.

To be precise I should make any CodeBlock from this:

this is a **code block** with some commands

```my_lexer
enable
configure **terminal**
interface **g0/0**
```

to this

this is a **code block** with some commands

```my_lexer
enable
configure terminal
interface g0/0
```

The fun parts:

  • there are bold words in the rest of the document I would like to maintain (outside code block)
  • not every row of the code block has bold in it
  • not even every code block has necessarily bold in it

Now I'm using visual studio code with the substitute in files, and most of the easy regex I did for the porting is working. But it's not a perfect regex syntax (for examples, groups are denoted with $1 instead of \1 and maybe some other differences I don't know about).
But I accept other software (regex flavors) too if they are more regex compliant and accept 'replace in all files and subdirectories' (like notepad , atom, etc..)

Sadly, I don't even know how to start something so complicated.
The most advanced I did is this: https://regex101.com/r/vRnkop/1 (there is also the text i'm using to test it)

(^```.*\n)(.*?\*\*(.*?)\*\*.*$\n)*

I hardly think this is a good start to do that!

Thanks

CodePudding user response:

Visual Studio is not my forté but I did read you should be able to use PCRE2 regex syntax. Therefor try to substitute the following pattern with an empty string:

\*\*(?=(((?!^```).)*^```)(?:(?1){2})*(?2)$)

See an online demo. The pattern seems a bit rocky and maybe someone else knows a much simpler pattern. However I did wanted to make sure this would both leave italic alone and would make bold italic to italic. Note that . matches newline here.

CodePudding user response:

If you have unix tools like sed. it is quite easy:

sed '/^```my_lexer/,/^```/ s/\*\*//' orig.md >new.md
  • /regex1/,/regex2/ cmd looks for a group of lines where the first line matches the first regex and the final line matches the second regex, and then runs cmd on each of them. This limits the replacements to the relevant sections of the file.
  • s/\*\*// does search and replace (I have assumed any instance of ** should be deleted

Some versions of sed allow "in-place" editing with -i. For example, to edit file.md and keep original version as file.md.orig:

sed -i.orig '...' file.md

and you can edit multiple files with something like:

find -name '*.md' -exec sed -i.orig '...' \{} \ 
  • Related