I want to remove html tag's attributes using regex. It could be any html element and allow nested elements like:
<div fadeout"="" style="margin:0px;" class="xyz">
<img src="abc.jpg" alt="" />
<p style="margin-bottom:10px;">
The event is celebrating its 50th anniversary Kö
<a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
</p>
<p style="padding:0px;"></p>
<p style="color:black;">
<strong>A festival for art lovers</strong>
</p>
</div>
or it could be like
<span style="margin: 0;"><p > Test text</p></span>
because of security reason, need to remove attributes
What I have tried to remove
s/(<\w )\s [^>]*/$1/
<*\b[^<]*>(?:[^<] (?:<(?!\/?div\b)[^<]*)*|(?R))*<\/*>\s*
<([a-z][a-z0-9]*)[^>]*?(\/?)>
but not working
CodePudding user response:
Regex should not be used to parse HTML.
Instead, you should use a DOMParser
to parse the string, loop through each element's attributes and use Element.removeAttribute
:
const str = `<div fadeout"="" style="margin:0px;" >
<img src="abc.jpg" alt="" />
<p style="margin-bottom:10px;">
The event is celebrating its 50th anniversary Kö
<a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
</p>
<p style="padding:0px;"></p>
<p style="color:black;">
<strong>A festival for art lovers</strong>
</p>
</div>`
function stripAttributes(html){
const parsed = new DOMParser().parseFromString(html, 'text/html')
parsed.body.querySelectorAll('*').forEach(elem => [...elem.attributes].forEach(attr => elem.removeAttribute(attr.name)))
return parsed.body.innerHTML;
}
console.log(stripAttributes(str))
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
CodePudding user response:
I would advise you not to use regexes in this situation, but if you dont have a choice maybe you are looking for something like this:
/<\s*([a-z][a-z0-9]*)\s.*?>/gi
CodePudding user response:
The nice thing about working with the DOM is that you have a whole set of tools available to you that were designed specifically for manipulating a DOM! And yet people insist on treating this complex structured data format as though it's just a dumb string and start hacking away at it with regex.
Use the right tool for the job.
function removeAttributesRecursively(el) {
Array.from(el.attributes).forEach(function(attr) {
// you'll probably want to include extra logic here to
// preserve some attributes (a href, img src, etc)
// instead of blindly removing all of them
el.removeAttribute(attr.name);
});
// recurse:
Array.from(el.children).forEach(function(child) {
removeAttributesRecursively(child)
})
}
const root = document.getElementById('input');
removeAttributesRecursively(root)
console.log(root.innerHTML)
<div id="input">
<div fadeout="" style="margin:0px;" class="xyz">
<img src="abc.jpg" alt="" />
<p style="margin-bottom:10px;">
The event is celebrating its 50th anniversary Kö
<a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
</p>
<p style="padding:0px;"></p>
<p style="color:black;">
<strong>A festival for art lovers</strong>
</p>
</div>
</div>
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>