Home > Mobile >  Orphan <li> and </li> tags
Orphan <li> and </li> tags

Time:10-13

I inherited some html where orphan list item tags <li> and </li> surround certain paragraphs but without any opening nor closing <ul> or <ol> tags.

Is there a way to parse or find/replace in mass over 700 html files to get rid of these orphan tags?

The files also have plenty of legitimate chunks where the same tags are opened and closed normally, and those are fine, so I mustn't alter them.

I'm comfortable with RegEx, Notepad , Excel among others :)

Any help is much appreciated

For clarification below is a short sample file among those impacted. The orphan list is the one in the middle with line spaces before and after:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="MS-HKWD" content="Commandes simplifiées:Créer une commande simplifiée" />
<meta name="topic-check-list" content="Index dynamique exécuté" />
<meta name="generator" content="Adobe RoboHelp 2019" />
<title>Purchase Module</title>
<link href="..\Model.css" rel="stylesheet" type="text/css" />
</head>

<body>
<p >Create a basic order</p>

<hr style="color: #002c52; background-color: #002c52;" width="103.064%" align="left" />
<p >How to</p>



<li><p>In left panel, click on <b>ADD</b>.<br />
&gt; An empty record is displayed.</p></li>
<li><p>Define the basic order.<br />
This mainly involves:</p></li>




<img src="../Icones/paperclip.png" alt="paperclip" border="0" /> These fields are required.</p>
<ul style="list-style-type: square;">
    <li style="text-align: left;"><p><b>Nb</b>: order number.</p></li>
    <li style="text-align: left;"><p><b>Status</b>*: current status of the order.</p></li>
    <li style="text-align: left;"><p><b>Destination</b>: department requesting the goods.</p></li>
    <li style="text-align: left;"><p><b>Supplier*</b>: 4 digit supplier code.
    </p>
</ul>

</body>
</html>

CodePudding user response:

Use a DOM parser and then find the orphaned li elements and remove them, using the DOM API. Finally extract the resulting HTML:

function removeOrphanListItems(html) {
    const doc = new DOMParser().parseFromString(html, "text/html");
    doc.querySelectorAll("li").forEach(li =>
        !["UL","OL"].includes(li.parentNode.tagName) && li.remove()
    )
    return doc.body.innerHTML;
}

let html = "<ul><li>one<li>two</ul><div><li>three</li></div>";
console.log(removeOrphanListItems(html));

After edit to your question...

It became clear that your HTML is actually the content of your HTML page on which the script runs. In that case, you can apply the above logic directly on the DOM of that page:

function removeOrphanListItems() {
    document.querySelectorAll("li").forEach(li =>
        !["UL","OL"].includes(li.parentNode.tagName) && li.remove()
    )
}
document.addEventListener("DOMContentLoaded", removeOrphanListItems);
<p >Create a basic order</p>

<hr style="color: #002c52; background-color: #002c52;" width="103.064%" align="left" />
<p >How to</p>

<li><p>In left panel, click on <b>ADD</b>.<br />
&gt; An empty record is displayed.</p></li>
<li><p>Define the basic order.<br />
This mainly involves:</p></li>

<img src="../Icones/paperclip.png" alt="paperclip" border="0" /> These fields are required.</p>
<ul style="list-style-type: square;">
    <li style="text-align: left;"><p><b>Nb</b>: order number.</p></li>
    <li style="text-align: left;"><p><b>Status</b>*: current status of the order.</p></li>
    <li style="text-align: left;"><p><b>Destination</b>: department requesting the goods.</p></li>
    <li style="text-align: left;"><p><b>Supplier*</b>: 4 digit supplier code.
    </p>
</ul>

  • Related