I have an app that converts some markdown files to html using marked.js and displays the converted HTML to the web page. In the code snippet below I iterate over text nodes to grab all the raw text values that get displayed to store them in some index where a text value corresponds to a numeric id
// iterate over file objects which contain raw markdown
files.forEach(file => {
// convert file's markdown string to html string using marked.js
const htmlString = marked(file.markDown);
const parser = new DOMParser();
const doc = parser.parseFromString(htmlString, 'text/html');
const walker = document.createTreeWalker(doc, NodeFilter.SHOW_TEXT);
// const walker = document.createTreeWalker(doc, NodeFilter.SHOW_ELEMENT);
let currentNode = walker.currentNode;
while (currentNode != null) {
if (currentNode.textContent != null) {
// index every HTML element's raw text value and assign it to an id
searchIndexReference.push({ id: id , text: currentNode.textContent });
//console.log('currentNode');
//console.log(currentNode);
}
const nextNode = walker.nextNode();
if (nextNode != null) {
currentNode = nextNode;
} else {
break;
}
}
});
I want to know how I can grab the header id value and add it to the index as well, until the next header id is encountered. That way the searchIndexReference will link text value entries to the header it was under.
Say we have some HTML below:
<h1 id="top-most-header">Top Most Header</h1>
<p>Here is some text 1</p>
<p>Here is some text 2</p>
<h2 id="some-other-section-header">Some Other Section Header</h1>
<p>Here is some text 3</p>
<p>Here is some text 4</p>
These entries would get appended to the searchIndexReference object like so. (Current header id value is stored until the next header is encountered)
{id: 1, headerId: 'top-most-header', text: 'Top Most Header'}
{id: 2, headerId: 'top-most-header', text: 'Here is some text 1'}
{id: 3, headerId: 'top-most-header', text: 'Here is some text 2'}
{id: 4, headerId: 'some-other-section-header', text: 'Some Other Section Header'}
{id: 5, headerId: 'some-other-section-header', text: 'Here is some text 3'}
{id: 6, headerId: 'some-other-section-header', text: 'Here is some text 4'}
This should work for nested elements as well too like ul, li, etc.
I know when I print out currentNode using
const walker = document.createTreeWalker(doc, NodeFilter.SHOW_ELEMENT);
instead of NodeFilter.SHOW_TEXT it shows the full HTML header element with the id, but I'm not sure where to go from there.
printing out currentNode with SHOW_ELEMENT
CodePudding user response:
Check parent node and if it's a header then save the id
if(currentNode.parentNode.nodeName === 'H1') {
headerId = currentNode.parentNode.id
}