Home > Net >  Regex to remove all attributes from nested html tags - Javascript
Regex to remove all attributes from nested html tags - Javascript

Time:11-10

I want to remove html tag's attributes using regex. It could be any html element and allow nested elements like:

<div fadeout"="" style="margin:0px;" class="xyz">
    <img src="abc.jpg" alt="" />
    <p style="margin-bottom:10px;">
    The event is celebrating its 50th anniversary K&ouml;&nbsp;
    <a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
    </p>
    <p style="padding:0px;"></p>
    <p style="color:black;">
       <strong>A festival for art lovers</strong>
    </p>
</div>

or it could be like

<span style="margin: 0;"><p > Test text</p></span>

because of security reason, need to remove attributes

What I have tried to remove

s/(<\w )\s [^>]*/$1/

<*\b[^<]*>(?:[^<] (?:<(?!\/?div\b)[^<]*)*|(?R))*<\/*>\s*
<([a-z][a-z0-9]*)[^>]*?(\/?)>

but not working

CodePudding user response:

Regex should not be used to parse HTML.

Instead, you should use a DOMParser to parse the string, loop through each element's attributes and use Element.removeAttribute:

const str = `<div fadeout"="" style="margin:0px;" >
    <img src="abc.jpg" alt="" />
    <p style="margin-bottom:10px;">
    The event is celebrating its 50th anniversary K&ouml;&nbsp;
    <a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
    </p>
    <p style="padding:0px;"></p>
    <p style="color:black;">
       <strong>A festival for art lovers</strong>
    </p>
</div>`

function stripAttributes(html){
  const parsed = new DOMParser().parseFromString(html, 'text/html')
  parsed.body.querySelectorAll('*').forEach(elem => [...elem.attributes].forEach(attr => elem.removeAttribute(attr.name)))
  return parsed.body.innerHTML;
}

console.log(stripAttributes(str))
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

I would advise you not to use regexes in this situation, but if you dont have a choice maybe you are looking for something like this:

/<\s*([a-z][a-z0-9]*)\s.*?>/gi 

CodePudding user response:

The nice thing about working with the DOM is that you have a whole set of tools available to you that were designed specifically for manipulating a DOM! And yet people insist on treating this complex structured data format as though it's just a dumb string and start hacking away at it with regex.

Use the right tool for the job.

function removeAttributesRecursively(el) {
  Array.from(el.attributes).forEach(function(attr) {
    // you'll probably want to include extra logic here to
    // preserve some attributes (a href, img src, etc)
    // instead of blindly removing all of them
    el.removeAttribute(attr.name);
  });
  // recurse:
  Array.from(el.children).forEach(function(child) {
    removeAttributesRecursively(child)
  })
}

const root = document.getElementById('input');

removeAttributesRecursively(root)

console.log(root.innerHTML)
<div id="input">
  <div fadeout="" style="margin:0px;" class="xyz">
    <img src="abc.jpg" alt="" />
    <p style="margin-bottom:10px;">
      The event is celebrating its 50th anniversary K&ouml;&nbsp;
      <a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
    </p>
    <p style="padding:0px;"></p>
    <p style="color:black;">
      <strong>A festival for art lovers</strong>
    </p>
  </div>
</div>
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

  • Related