Home > Software engineering >  How to store last encountered header's id from HTML when traversing the DOM in Typescript
How to store last encountered header's id from HTML when traversing the DOM in Typescript

Time:12-03

I have a typescript/react app that converts some markdown files to html using marked.js and displays the converted HTML to the web page. In the code snippet below I iterate over text nodes to grab all the raw text values that get displayed to store them in 'searchIndexReference' where a text value corresponds to a numeric id.

How can I keep track of the most recently encountered header's ID and store it in 'headerReference' for all text values found until the next headerID is encountered? For all HTML elements

let id: number = 1;
let headerIdFromHTML = '';
const headerTags = ['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6'];

files.forEach(file => {
    // convert markdown content to html using marked.js
    const htmlString = marked(file.markDown);
    const parser = new DOMParser();
    const doc = parser.parseFromString(htmlString, 'text/html');
    const walker = document.createTreeWalker(doc, NodeFilter.SHOW_TEXT);

    let currentNode = walker.currentNode;
    // gather raw text from every HTML element
    while (currentNode != null) {
        if (currentNode.textContent != null) {
            if (currentNode.parentElement) {
                // only care about h1-h6
                if (headerTags.includes(currentNode.parentElement.tagName)) {
                    // store most recently seen header id, update when new header encountered
                    headerIdFromHTML = currentNode.parentElement.id;
                }
            }
            searchIndexReference.push({ id, text: currentNode.textContent });
            headerReference.push({ id, source: file.source, headerId: headerIdFromHTML });
            id  ;
        }
        const nextNode = walker.nextNode();
        if (nextNode != null) {
            currentNode = nextNode;
        } else {
            break;
        }
    }
});

My current code works somewhat, but it seems to break on HTML elements with href or <em> so far. In the example below, the last two elements IDs are not properly fetched.

<h3 id="banner">BANNER</h3>
<h1 id="project-name">Project Name</h1>
<h2 id="quick-links-a-nameproject_linksa">Quick Links <a name="project_links"></a></h2>
<li><h5 id="basic-components"><a href="#project_components">Basic Components</a></h5></li>
<h4 id="general"><em>General</em></h4>

Expected state of headerReference after reading in HTML above:

{id: 1, headerId: 'banner', text: 'BANNER'}
{id: 2, headerId: 'project-name', text: 'Project Name'}
{id: 3, headerId: 'quick-links-a-nameproject_linksa', text: 'Quick Links'}
{id: 4, headerId: 'basic-components', text: 'Basic Components'}
{id: 5, headerId: 'general', text: 'General'}

CodePudding user response:

This should get you at least close to where you want to get. Focusing on your sample html and using xpath, it creates an array of arrays:

let headerTags = ["h1", "h2", "h3", "h4", "h5"],
  counter = 1,
  headerReference = [],
  hits = document.evaluate("//*", document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);

for (let i = 0; i < hits.snapshotLength; i  ) {
  let hit = hits.snapshotItem(i);
  if (headerTags.includes(hit.tagName.toLowerCase())) {
    row = [];
    row.push(counter, hit.id, hit.innerText);
    headerReference.push(row);
    counter  = 1;
  }
}
console.log(headerReference);
<app>
  <h3 id="banner">BANNER</h3>
  <h1 id="project-name">Project Name</h1>
  <h2 id="quick-links-a-nameproject_linksa">Quick Links
    <a name="project_links"></a>
  </h2>
  <li>
    <h5 id="basic-components">
      <a href="#project_components">Basic Components</a>
    </h5>
  </li>
  <h4 id="general">
    <em>General</em>
  </h4>
</app>

You will have to modify this to fit your actual files, but it should be a start.

  • Related