Is there any way to strip HMTL tag with content in HTML
example :
const regexForStripHTML = /(<([^>] )>)/gi
const text = "OCEP <sup>®</sup> water product"
const stripContent = text.replaceAll(regexForStripHTML, '')
output : 'OCEP ® water product'
I need to remove ®
also from a string
expected output
OCEP water product
CodePudding user response:
Removing all HTML tags and the innerText can be done with the following snippet. The Regexp captures the opening tag's name, then matches all content between the opening and closing tags, then uses the captured tag name to match the closing tag.
const regexForStripHTML = /<([^</> ] )[^<>]*?>[^<>]*?<\/\1> */gi;
const text = "OCEP <sup>®</sup> water product";
const stripContent = text.replaceAll(regexForStripHTML, '');
console.log(text);
console.log(stripContent);
CodePudding user response:
This should suffice your use-case:
const regexForStripHTML = /<sup.*>.*?<\/sup>/ig
const text = "OCEP <sup>®</sup> water product"
const stripContent = text.replaceAll(regexForStripHTML, '');
console.log(stripContent);
If you want to do it with any HTML tag. See code below:
const regexForStripHTML = /<.*>.*?/ig
const text = "OCEP <html>®</html> water product"
const stripContent = text.replaceAll(regexForStripHTML, '');
console.log(stripContent);
CodePudding user response:
Context
To remove the text from between the tags you would need to match opening and closing tags of the same tag name. This regex would match the starting tags <(?<tagname>.*?)>
. Notice how tagname
remembers the and is being used for the regex part of the corresponding closing tags which is <\/\k<tagname>>
the part in between .*?
is to match for any text.
Code
const regexForStripHTML = /(<(?<tagname>.*?)>.*?<\/\k<tagname>>)/g
const text = "OCEP <sup>®</sup> water product"
const stripContent = text.replaceAll(regexForStripHTML, '$')
Note
I haven't thought about what happens if the tags are nested.