Home > Net >  Regular expression for replacing word in text that skips sections of text
Regular expression for replacing word in text that skips sections of text

Time:11-23

I have the below HTML file (simplified code):

<!DOCTYPE html>
<html lang="en" class="no-js">
...

<body>
  This is my header!
  {% include "header.tpl.html"%}
</body>

</html>

And I have the below code snippet in JS that is supposed to replace all occurences of "header" in the HTML:

const v = 'header';
let re = new RegExp(`${v}`, 'gi');
fileStr = fileStr.replace(re, '{% test %}');

As I result, I get:

<!DOCTYPE html>
<html lang="en" class="no-js">
...

<body>
  This is my {% test %}!
  {% include "{% test %}.tpl.html"%}
</body>

</html>

So, the first {% test %} replaced works as expected, but the regular expression also detects occurrences inside curly brackets, which messes up my template.

How do I build a regular expression that skips all text between {% and %} markers?

CodePudding user response:

You can match the strings inside{%...%} and capture them in order to be able to skip them:

var fileStr = ' This is my header!\n  {% include "header.tpl.html"%}';
let v = 'header';
let re = new RegExp(`({%[^]*?%})|${v}`, 'gi');
fileStr = fileStr.replace(re, (x,y) => y || `{% test %}`);
console.log(fileStr);
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

The ({%[^]*?%})|header pattern matches and captures {% zero or more chars as few as possible, and then %} into Group 1, or matches header string in a case insensitive way in any other context.

Then, if Group 1 matches, the replacement is the Group 1 value, so the {%...%} substrings are kept as is, and then header is replaced with test.

You may also use

let re = new RegExp(`({%[^]*?%})|\\b${v}\\b`, 'gi');

to replace whole words only (but only if these words have no special chars, otherwise, you would need other types of boundaries).

  • Related