Regex to match first two <li> of a <ul> list-CodePudding

We have a HTML string where we have a list. Something like shown below

<ul><li>Item A</li><li>Item B</li><li>Item C</li><li>Item D</li></ul>

Here is the formatted version of the same.

<ul>
   <li>Item A</li>
   <li>Item B</li>
   <li>Item C</li>
   <li>Item D</li>
</ul>

My objective is to write a regular expression to select the first two items for the list. So the output should be

<ul>
   <li>Item A</li>
   <li>Item B</li>
</ul>

If that's not possible to do it in regex, what will be a most optimized way to do it through plain javascript code.

CodePudding user response：

Don't use a regex for this. Parse the HTML into a document fragment and use DOM methods to remove the elements

const html = `<ul><li>Item A</li><li>Item B</li><li>Item C</li><li>Item D</li></ul>`;

const parser = new DOMParser();
const doc = parser.parseFromString(html, "text/html");

// Remove all <li> elements after the 2nd
doc.querySelectorAll("li:nth-child(n 3)").forEach(el => el.remove());

// DOMParser puts the HTML fragment into the created document body
const newHtml = doc.body.innerHTML;

console.log(newHtml);

This also works with end-tag omission like this...

<ul>
   <li>Item A
   <li>Item B
   <li>Item C
   <li>Item D
</ul>

which is potentially something a regular expression could really struggle with.

CodePudding user response：

You can do:

const str = `<ul><li>Item A</li><li>Item B</li><li>Item C</li><li>Item D</li></ul>`
const re = tag => new RegExp(`<${tag}>(.*?)<\/${tag}>`, 'g')
const ul = str.replace(re('ul'), ($1, $2) => {
  const li = $2.match(re('li')).slice(0, 2).join(``)
  return $1.replace($2, li)
})

console.log(ul)