Need to check if processing instruction `<?covid19?>` is present in XML or not-CodePudding

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1d1 20130915//EN" "JATS-journalpublishing1.dtd"[]>
<article dtd-version="1.1d1" article-type="review-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en">
<front>
<?covid19?>

I need to find if the <?covid19?> processing instruction is present in the XML or not.

Pseudo-code in jQuery:

$("<?covid19?>").length

CodePudding user response：

While there are no CSS selectors to select processing instructions like elements, there are two nice APIs which you can use to avoid iterating over the DOM tree manually.

Let’s say your XML document is the XMLDocument theDocument; you can create one by parsing the XML string with the DOMParser API:

const xml = `<?xml version="1.0" encoding="UTF-8"?>
<root>
  <?covid19 content-a?>
  <?covid19 content-b?>
  <?thing x?>
  <child>
    <?covid19 content-c?>
    <?thing y?>
  </child>
  <covid19>
    Reject this node.
  </covid19>
</root>`,
  theDocument = new DOMParser().parseFromString(xml, "text/xml");

`NodeIterator` API

Finding all processing instructions is possible using the NodeIterator API (using theDocument.createNodeIterator).

const iteratorAll = theDocument
    .createNodeIterator(theDocument, NodeFilter.SHOW_PROCESSING_INSTRUCTION);

const iteratorCOVID19 = theDocument
    .createNodeIterator(theDocument, NodeFilter.SHOW_PROCESSING_INSTRUCTION, {
      acceptNode(node){
        if(node.nodeName.toLowerCase() === "covid19"){
          return NodeFilter.FILTER_ACCEPT;
        }

        return NodeFilter.FILTER_SKIP;
      }
    });

iteratorAll is a NodeIterator which shows all processing instructions. iteratorCOVID19 is a NodeIterator which shows all processing instructions with the name covid19.

The TreeWalker API (using theDocument.createTreeWalker) is very similar to the NodeIterator API.

XPath Iterator API

Finding all processing instructions is also possible using XPath (using theDocument.evaluate). The results are XPathResults.

const xPathAll = theDocument
    .evaluate("//processing-instruction()", theDocument, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE);

const xPathCOVID19 = theDocument
    .evaluate("//processing-instruction('covid19')", theDocument, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE);

Explanation of the XPath syntax:

Token	Meaning
`//`	Get all descendants
`processing-instruction()`	Get nodes of type “processing instruction”
`processing-instruction('covid19')`	Get nodes of type “processing instruction” with the node name `covid19`

The XPathResult.ORDERED_NODE_ITERATOR_TYPE is useful to ensure that the nodes get returned in document-order.

xPathAll is an XPathResult iterator which shows all processing instructions. xPathCOVID19 is an XPathResult iterator which shows all processing instructions with the name covid19.

Iteration helper

The two APIs are great in terms of browser support, but that means they’re old enough that they don’t have a modern iteration protocol. But this is where a generator proves useful.

This code defines the generator function consumeDOMIterator which will just fully consume all nodes found by either iterator. Since the NodeIterator API’s method to get the next result is called nextNode, and the XPath method is called iterateNext, this function checks which of these method names to use. If it can’t find the appropriate method, it’ll defer to the default iteration protocol. Then, a simple while loop repeatedly calls one of these methods and yields them until null is returned.

function* consumeDOMIterator(iterator){
  const method = (iterator instanceof NodeIterator || iterator instanceof TreeWalker
    ? "nextNode"
    : iterator instanceof XPathResult && [
      XPathResult.UNORDERED_NODE_ITERATOR_TYPE,
      XPathResult.ORDERED_NODE_ITERATOR_TYPE
    ].includes(iterator.resultType)
    ? "iterateNext"
    : null);
  
  if(!method){
    yield* iterator[Symbol.iterator]();
    
    return;
  }
  
  let node;
  
  while((node = iterator[method]())){
    yield node;
  }
}

Now the function can be used to create an Array from the iterator. Array.from can be used to achieve this easily:

Array.from(consumeDOMIterator(iteratorCOVID19))

// Or any of these:
Array.from(consumeDOMIterator(xPathCOVID19))
Array.from(consumeDOMIterator(iteratorAll))
Array.from(consumeDOMIterator(xPathAll))

To check the existence of a processing instruction, simply check the Array’s length or if iteratorCOVID19.nextNode() or xPathCOVID19.iterateNext() return a Node.

Note that its name includes consume for a reason: once you start iterating over the API results using this function to create an Array, the state of the results changes. Once you reach the end, either iterator will be at the “end” of the document, so there is no next node. While the NodeIterator API has a previousNode, the XPath Iterator API does not have a corresponding method; in general iterators can only be iterated once.

XPath Snapshot API

Alternatively, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE may be used to get a more direct set of results that can be iterated over more easily.

const snapshotAll = theDocument
    .evaluate("//processing-instruction()", theDocument, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE);

const snapshotCOVID19 = theDocument
    .evaluate("//processing-instruction('covid19')", theDocument, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE);

Now, since the XPathResult is a snapshot, the snapshotLength can be used in order to get all the snapshotItems. Again, Array.from can be used to achieve this easily:

Array.from({
  length: snapshotCOVID19.snapshotLength
}, (_, index) => snapshotCOVID19.snapshotItem(index));

The only difference to the iterator approach is that the XPathResult does not change when the underlying document is mutated.

To check the existence of a processing instruction, simply check the Array’s length or if snapshotCOVID19.snapshotItem(0) returns a Node.

Full code

This code snippet demonstrates in full how to get all processing instructions of the form <?covid19?> and, for example, get their nodeValue:

function* consumeDOMIterator(iterator){
  const method = (iterator instanceof NodeIterator || iterator instanceof TreeWalker
    ? "nextNode"
    : iterator instanceof XPathResult && [
      XPathResult.UNORDERED_NODE_ITERATOR_TYPE,
      XPathResult.ORDERED_NODE_ITERATOR_TYPE
    ].includes(iterator.resultType)
    ? "iterateNext"
    : null);
  
  if(!method){
    yield* iterator[Symbol.iterator]();
    
    return;
  }
  
  let node;
  
  while((node = iterator[method]())){
    yield node;
  }
}

const xml = `<?xml version="1.0" encoding="UTF-8"?>
<root>
  <?covid19 content-a?>
  <?covid19 content-b?>
  <?thing x?>
  <child>
    <?covid19 content-c?>
    <?thing y?>
  </child>
  <covid19>
    Reject this node.
  </covid19>
</root>`,
  theDocument = new DOMParser().parseFromString(xml, "text/xml"),
  iteratorAll = theDocument
    .createNodeIterator(theDocument, NodeFilter.SHOW_PROCESSING_INSTRUCTION),
  iteratorCOVID19 = theDocument
    .createNodeIterator(theDocument, NodeFilter.SHOW_PROCESSING_INSTRUCTION, {
      acceptNode(node){
        if(node.nodeName.toLowerCase() === "covid19"){
          return NodeFilter.FILTER_ACCEPT;
        }

        return NodeFilter.FILTER_SKIP;
      }
    }),
  xPathAll = theDocument
    .evaluate("//processing-instruction()", theDocument, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE),
  xPathCOVID19 = theDocument
    .evaluate("//processing-instruction('covid19')", theDocument, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE),
  snapshotAll = theDocument
    .evaluate("//processing-instruction()", theDocument, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE),
  snapshotCOVID19 = theDocument
    .evaluate("//processing-instruction('covid19')", theDocument, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE);

const demoIterator = (iterator) => Array.from(consumeDOMIterator(iterator), ({ nodeName, nodeValue }) => [
  nodeName,
  nodeValue
]);
const demoIterator2 = (iterator) => Array.from(consumeDOMIterator(iterator), ({ nodeValue }) => nodeValue);

console.log("All PIs using NodeIterator", demoIterator(iteratorAll));
console.log("All PIs using XPath Iterator", demoIterator(xPathAll));
console.log("All PIs using XPath Snapshot", Array.from({
  length: snapshotAll.snapshotLength
}, (_, index) => {
  const {
      nodeName,
      nodeValue
    } = snapshotAll.snapshotItem(index);
  
  return [
    nodeName,
    nodeValue
  ];
}));

console.log("All node values of <?covid19?> PIs using NodeIterator", demoIterator2(iteratorCOVID19));
console.log("All node values of <?covid19?> PIs using XPath Iterator", demoIterator2(xPathCOVID19));
console.log("All node values of <?covid19?> PIs using XPath Snapshot", Array.from({
  length: snapshotCOVID19.snapshotLength
}, (_, index) => snapshotCOVID19.snapshotItem(index).nodeValue));

.as-console-wrapper { max-height: 100% !important; top: 0; }

CodePudding user response：

Seems like a typical use case for XPath

XPath allows you to query XML in a very flexible way.

This tutorial could help:

https://www.w3schools.com/xml/xpath_intro.asp

NodeIterator API

XPath Iterator API

Iteration helper

XPath Snapshot API

Full code

`NodeIterator` API