I'm trying to remove all html tags except only <s></s>
tags. Right now I have:
contents.replace(/(<([^>] )>)/gi, '')
This remove all html tags.
So...
i tried many other solutions.
<\/?(?!s)\w*\b[^>]*>
. <(?!s|/s).*?>
.....
However these regex remove all tags containing the letter 's'.
For example, <strong>
<span>
and so on.
I'd really appreciate it if you could help me.
CodePudding user response:
Whether or not this is possible depends on how accurate you want to be. Regex cannot be used to 100% accurately parse HTML.
But if you just want something quick and dirty:
You can take advantage of the fact that String.prototype.replace
allows you to differentiate between capture groups: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#specifying_a_function_as_the_replacement
So you can make two capture groups:
Group 1 (<s>
or </s>
): <\/?s>
Group 2: ("starts with <
, ends with >
, and has no >
between"): (<[^>]*>)
Then when calling string.replace
return the match if it matches group 1, else it has only matched group 2, so return an empty string:
function removeTags(text) {
const regex = /(<\/?s>)|(<[^>]*>)/g; // Group 1 OR Group 2
return text.replace(regex, (_, g1) => g1 || '');
}
let text = '<span>Span Text <s>S Text <strong>Strong Text</strong></s></span>';
console.log(removeTags(text));
Note the flaw: if <
and >
exist as text, everything in between may be considered a tag when it is not:
function removeTags(text) {
const regex = /(<\/?s>)|(<[^>]*>)/g; // Group 1 OR Group 2
return text.replace(regex, (_, g1) => g1 || '');
}
let text = '<p> This is how you start a tag: `<` and this is how you end a tag: `>`</p>';
console.log("But the regex fails:");
console.log(removeTags(text));
XML parsers can see that the brackets do not create a tag:
<p> This is how you start a tag: `<` and this is how you end a tag: `>`</p>
If you want accurate parsing, use an XML parser.
CodePudding user response:
You could try: /(<([^>s] )>)|(<\/?(\w{2,})>)/gmi
The first part (<([^>s] )>)
will capture all html tags, except tag contain letter s
.
The second part (<\/?(\w{2,})>)
will capture all html tags which have 2 letters or more.