Home > Back-end >  JavaScript use regular expressions(regex) replaces words encounter Whitespace issues?
JavaScript use regular expressions(regex) replaces words encounter Whitespace issues?

Time:11-19

I need to use regular expressions to wrap HTML tags around certain words in the text,

Here is my JavaScript example:

In this case, the first "We" is not replaced. Why? How to modify it?

var str="Welcome Microsoft We are Microsoft! we wehas weo in the WE world we.";
var res = str.replace(/([\s\!\.])(micro|microsoft|we)([\s\!\.])/gi, "$1<em>$2</em>$3");
console.log(res);
// wrong:Welcome <em>Microsoft</em> We are <em>Microsoft</em>! <em>we</em> wehas weo in the <em>WE</em> world <em>we</em>.
// right:Welcome <em>Microsoft</em> <em>We</em> are <em>Microsoft</em>! <em>we</em> wehas weo in the <em>WE</em> world <em>we</em>.

CodePudding user response:

"We" is not replaced. Why?

Because the space that precedes it, was already consumed by the previous capture, and so the first space the regex engine can find to precede a word comes only after "we".

In other words, your regex matches an additional character after the word that cannot be reused for a next match.

How to modify it?

The quick fix is to make that space-check a look-ahead (BTW, there is no need to scape ! or . inside a character class):

str.replace(/([\s!.])(micro|microsoft|we)(?=[\s!.])/gi, "$1<em>$2</em>");

Now this will solve the actual case, but if your word is the very first or very last word in the input, it won't match it, because there is no preceding/succcessive character.

It is quite common to actually use a word break \b:

str.replace(/\b(micro|microsoft|we)\b/gi, "<em>$1</em>");
  • Related