We have a HTML string where we have a list. Something like shown below
<ul><li>Item A</li><li>Item B</li><li>Item C</li><li>Item D</li></ul>
Here is the formatted version of the same.
<ul>
<li>Item A</li>
<li>Item B</li>
<li>Item C</li>
<li>Item D</li>
</ul>
My objective is to write a regular expression to select the first two items for the list. So the output should be
<ul>
<li>Item A</li>
<li>Item B</li>
</ul>
If that's not possible to do it in regex, what will be a most optimized way to do it through plain javascript code.
CodePudding user response:
Don't use a regex for this. Parse the HTML into a document fragment and use DOM methods to remove the elements
const html = `<ul><li>Item A</li><li>Item B</li><li>Item C</li><li>Item D</li></ul>`;
const parser = new DOMParser();
const doc = parser.parseFromString(html, "text/html");
// Remove all <li> elements after the 2nd
doc.querySelectorAll("li:nth-child(n 3)").forEach(el => el.remove());
// DOMParser puts the HTML fragment into the created document body
const newHtml = doc.body.innerHTML;
console.log(newHtml);
This also works with end-tag omission like this...
<ul>
<li>Item A
<li>Item B
<li>Item C
<li>Item D
</ul>
which is potentially something a regular expression could really struggle with.
CodePudding user response:
You can do:
const str = `<ul><li>Item A</li><li>Item B</li><li>Item C</li><li>Item D</li></ul>`
const re = tag => new RegExp(`<${tag}>(.*?)<\/${tag}>`, 'g')
const ul = str.replace(re('ul'), ($1, $2) => {
const li = $2.match(re('li')).slice(0, 2).join(``)
return $1.replace($2, li)
})
console.log(ul)