Home > database >  How to remove or close unclosed HTML in JavaScript?
How to remove or close unclosed HTML in JavaScript?

Time:04-29

Supposedly I have a string like this that will go to my HTML:

<div>Wakanda Forever</div> <span >Black Panther</span>
Movies movies movies 
<span >Spider man...

The last span tag isn't closed.

In Regular expression and JavaScript, how can I remove the unclosed <span > from <span >Spider man... or close it with a </span> tag?

CodePudding user response:

Using regular expressions to do any sort of HTML manipulation is almost always a bad idea.

My recommended solution would be to do what the browser does: Parse the string into a DOM (in a similar fuzzy, forgiving way) and then turn that DOM back into HTML.

In a browser environment, this is especially easy because you can let the browser itself do it for you, by writing the bad HTML into innerHTML of an element and then reading it back - and the browser will have fixed it for you:

const badHtml = `
<div>Wakanda Forever</div> <span >Black Panther</span>
Movies movies movies 
<span >Spider man...
`

const element = document.createElement('i')
element.innerHTML = badHtml
const result = element.innerHTML

console.log(result)

In node.js, you could instead use a library like cheerio:

import cheerio from 'cheerio'

const badHtml = `
<div>Wakanda Forever</div> <span >Black Panther</span>
Movies movies movies 
<span >Spider man...
`

const $ = cheerio.load(badHtml)
const result = $.html()

console.log(result)
  • Related