Home > Enterprise >  Finding position of dom node in the document source
Finding position of dom node in the document source

Time:07-30

Context

I'm building a set of 'extractor' functions whose purpose is to extract what looks like components from a page (using jsdom and nodejs). The final result should be these 'component' objects ordered by where they originally appeared in the page.

Problem

The last part of this process is a bit problematic. As far as I can see, there's no easy way to tell where a given element is in a given dom document's source code.

The numeric depth or css/xpath-like path also doesn't feel helpful in this case.

Example

With the given extractors...

const extractors = [

  // Extract buttons
  dom => 
    Array.from(dom.window.document.querySelectorAll('button'))
    .map(elem => ({
      type: 'button',
      name: elem.name,
      position:                   ???
    })),

  // Extract links
  dom => 
    Array.from(dom.window.document.querySelectorAll('a'))
    .map(elem => ({
      type: 'link',
      name: elem.textContent,
      position:                   ???
      link: elem.href,
    })),

];

...and the given document (I know, it's an ugly and un-semantic example..):

<html>
  <body>
    <a href="/">Home</a>
    <button>Login</button>
    <a href="/about">About</a>
...

I need something like:

[
  { type: 'button', name: 'Login', position: 45, ... },
  { type: 'link', name: 'Home', position: 20, ... },
  { type: 'link', name: 'About', position: 72, ... },
]

(which can be later ordered by item.position)

CodePudding user response:

Try treewalker, this example is a class I wrote that uses treewalker.

class Tree {
  constructor(document, root, whatToShow) {
    this.doc = document;
    this.root = root;
    this.node = whatToShow;
  }
  walker(filter, expandEntity = false) {
    let SHOW = this.node === 1 ? NodeFilter.SHOW_ELEMENT : this.node === 3 ? NodeFilter.SHOW_TEXT : NodeFilter.SHOW_ELEMENT   NodeFilter.SHOW_TEXT;
    return this.doc.createTreeWalker(this.root, SHOW, filter, expandEntity);
  }
  walk() {
    let position = 0;
    let DOM = [];
    let W = this.walker.call(this);
    while (W.nextNode()) {
      let type = W.currentNode.tagName;
      let text = W.currentNode.textContent;
      let pos = position  ;
      let obj = Object.assign({}, {
        type: type,
        name: text,
        position: pos
      });
      DOM.push(obj);
    }
    return DOM;
  }
}

const w = new Tree(document, document.body, 1);
console.log(w.walk());
<body>

<input type="checkbox" id="nav-trigger"  />
<label for="nav-trigger" >
  <div ></div>
</label>

<label for="nav-trigger" ></label>

<div id="main">
    <section >
        <article><h1>js_utils</h1>
<p>This library is a collection of useful functions that we don't want te keep writing again and again.
It also serves the purpose to help manipulate the DOM when a framework like react.js is not needed or would be overkill.</p>
<h2>npm scripts</h2>
<pre ><code>  {
    &quot;build&quot;: &quot;node ./node_modules/webpack/bin/webpack.js --config ./webpack.config.js ./src/index.js&quot;,
    &quot;generate-docs&quot;: &quot;node_modules/.bin/jsdoc --configure .jsdoc.json --verbose&quot;,
    &quot;test&quot;: &quot;jest&quot;,
    &quot;test-verbose&quot;: &quot;jest --coverage --config ./jest.config.js&quot;
  }
</code></pre>
<ul>
<li><strong>build</strong>: compile source code to generate the complete library</li>
<li><strong>generate-docs</strong>: generate documentation using jsdoc comments</li>
<li><strong>test</strong>: run jest tests</li>
<li><strong>test-verbose</strong>: run tests and generate coverage reports</li>
</ul>
<p><a href="https://payouri.github.io/js_utils/index.html">Online Docs</a> </br>
<a href="https://codepen.io/Zorlimar/pen/vQQmOo">Available on CodePen</a></p></article>
    </section>

</div>

<br >

<footer>
    Generated by <a href="https://github.com/jsdoc3/jsdoc">JSDoc 3.6.4</a> on Thu May 28 2020 20:51:24 GMT 0200 (GMT 02:00) using the Minami theme.
</footer>

CodePudding user response:

One possible rough way I can think of is something like:

function findPos(elem){
  elem.setAttribute('data-pf', '1');
  try {
    return elem.ownerDocument.documentElement.outerHTML.indexOf('data-pf');
  } finally {
    elem.removeAttribute('data-pf');
  }
}

see also: https://github.com/jsdom/jsdom#serializing-the-document-with-serialize

However on top of being imprecise, it feels like overkill and possibly badly performing (unless it's crazy slow, that's not a big problem since this task is a one-time job).

  • Related