Home > Mobile >  (DOM cannot be used) Javascript regular expressions match text and filter HTML tags
(DOM cannot be used) Javascript regular expressions match text and filter HTML tags

Time:07-08

I need to use a regular expression to replace "text" in the text, but not the HTML tag attribute value "text".

Note: I know the various DOM manipulations, but my needs don't fit the DOM, so I can only use regular expressions here. I'm sorry.


Javascript code:

var str = '"text<div id="text">is text istext"text"</div> text '
var xxx = str.replace(/([\s.?,"';:!()\[\]{}<>/])(text)([\s.?,"';:!()\[\]{}<>/])/g, '$1xxxx$3');
console.log(xxx)

See the following example:

The wrong result is:

"xxxx<div id="xxxx">is xxxx istext"xxxx"</div> xxxx (This is the result of my code)

The correct result is:

"xxxx<div id="text">is xxxx istext"xxxx"</div> xxxx (id="text" stays the same here)

CodePudding user response:

it would be a lot simpler/reliable to break this into two distinct steps. The first step splits the html into tag contents, the second transforms the contents (if needed).

var s = '"text<div id="text">is text istext"text"</div> text ';
var r = s.split(/([<>])/); // split into array of parts
r.map(function(segment, index, all){
 if(segment=="<") this.inTag = 1;
 if(segment==">") this.inTag = 0;
 if(this.inTag) return segment; // return markup unchanged
 return segment.replace(/([\s.?,"';:!()\[\]{}<>/])(text)([\s.?,"';:!()\[\]{}<>/])/g, '$1xxxx$3');
 
},{inTag:0})
.join(""); // == "text<div id="text">is xxxx istext"xxxx"</div> xxxx

it looks like you need to tweak your regexp to match the first part, i just used your version, but that should be a lot simpler since you don't have to worry about trying to avoid tags markup using regexp markup.

Of course, the usual caveats and warnings about "parsing" html with RegExp apply (eg tag-looking markup in script tags), but this should be a lot easier to work with if you absolutely positively cannot use a real DOM to manipulate the content.

CodePudding user response:

You can use this regex:

/(?<=(^|>)[^<>]*)\btext\b/g

Explanation:

(?<= - look behind

(^|>) - for start of text OR a greater than sign >

[^<>]* - followed by zero or more of any character not being <>

) - end of look behind

\btext\b - match a word break followed be text and another word break

It's using the global flag.

Now replace with xxxx or whatever.

You can see it here: https://regex101.com/r/LX6UwH/1

  • Related