<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1d1 20130915//EN" "JATS-journalpublishing1.dtd"[]>
<article dtd-version="1.1d1" article-type="review-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en">
<front>
<?covid19?>
I need to find if the <?covid19?>
processing instruction is present in the XML or not.
Pseudo-code in jQuery:
$("<?covid19?>").length
CodePudding user response:
While there are no CSS selectors to select processing instructions like elements, there are two nice APIs which you can use to avoid iterating over the DOM tree manually.
Let’s say your XML document is the XMLDocument
theDocument
; you can create one by parsing the XML string with the DOMParser
API:
const xml = `<?xml version="1.0" encoding="UTF-8"?>
<root>
<?covid19 content-a?>
<?covid19 content-b?>
<?thing x?>
<child>
<?covid19 content-c?>
<?thing y?>
</child>
<covid19>
Reject this node.
</covid19>
</root>`,
theDocument = new DOMParser().parseFromString(xml, "text/xml");
NodeIterator
API
Finding all processing instructions is possible using the NodeIterator
API (using theDocument.createNodeIterator
).
const iteratorAll = theDocument
.createNodeIterator(theDocument, NodeFilter.SHOW_PROCESSING_INSTRUCTION);
const iteratorCOVID19 = theDocument
.createNodeIterator(theDocument, NodeFilter.SHOW_PROCESSING_INSTRUCTION, {
acceptNode(node){
if(node.nodeName.toLowerCase() === "covid19"){
return NodeFilter.FILTER_ACCEPT;
}
return NodeFilter.FILTER_SKIP;
}
});
iteratorAll
is a NodeIterator
which shows all processing instructions.
iteratorCOVID19
is a NodeIterator
which shows all processing instructions with the name covid19
.
The TreeWalker
API (using theDocument.createTreeWalker
) is very similar to the NodeIterator
API.
XPath Iterator API
Finding all processing instructions is also possible using XPath (using theDocument.evaluate
).
The results are XPathResult
s.
const xPathAll = theDocument
.evaluate("//processing-instruction()", theDocument, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE);
const xPathCOVID19 = theDocument
.evaluate("//processing-instruction('covid19')", theDocument, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE);
Explanation of the XPath syntax:
Token | Meaning |
---|---|
// |
Get all descendants |
processing-instruction() |
Get nodes of type “processing instruction” |
processing-instruction('covid19') |
Get nodes of type “processing instruction” with the node name covid19 |
The XPathResult.ORDERED_NODE_ITERATOR_TYPE
is useful to ensure that the nodes get returned in document-order.
xPathAll
is an XPathResult
iterator which shows all processing instructions.
xPathCOVID19
is an XPathResult
iterator which shows all processing instructions with the name covid19
.
Iteration helper
The two APIs are great in terms of browser support, but that means they’re old enough that they don’t have a modern iteration protocol. But this is where a generator proves useful.
This code defines the generator function consumeDOMIterator
which will just fully consume all nodes found by either iterator.
Since the NodeIterator
API’s method to get the next result is called nextNode
, and the XPath method is called iterateNext
, this function checks which of these method names to use.
If it can’t find the appropriate method, it’ll defer to the default iteration protocol.
Then, a simple while
loop repeatedly calls one of these methods and yield
s them until null
is returned.
function* consumeDOMIterator(iterator){
const method = (iterator instanceof NodeIterator || iterator instanceof TreeWalker
? "nextNode"
: iterator instanceof XPathResult && [
XPathResult.UNORDERED_NODE_ITERATOR_TYPE,
XPathResult.ORDERED_NODE_ITERATOR_TYPE
].includes(iterator.resultType)
? "iterateNext"
: null);
if(!method){
yield* iterator[Symbol.iterator]();
return;
}
let node;
while((node = iterator[method]())){
yield node;
}
}
Now the function can be used to create an Array from the iterator.
Array.from
can be used to achieve this easily:
Array.from(consumeDOMIterator(iteratorCOVID19))
// Or any of these:
Array.from(consumeDOMIterator(xPathCOVID19))
Array.from(consumeDOMIterator(iteratorAll))
Array.from(consumeDOMIterator(xPathAll))
To check the existence of a processing instruction, simply check the Array’s length
or if iteratorCOVID19.nextNode()
or xPathCOVID19.iterateNext()
return a Node
.
Note that its name includes consume
for a reason: once you start iterating over the API results using this function to create an Array, the state of the results changes.
Once you reach the end, either iterator will be at the “end” of the document, so there is no next node.
While the NodeIterator
API has a previousNode
, the XPath Iterator API does not have a corresponding method; in general iterators can only be iterated once.
XPath Snapshot API
Alternatively, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE
may be used to get a more direct set of results that can be iterated over more easily.
const snapshotAll = theDocument
.evaluate("//processing-instruction()", theDocument, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE);
const snapshotCOVID19 = theDocument
.evaluate("//processing-instruction('covid19')", theDocument, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE);
Now, since the XPathResult
is a snapshot, the snapshotLength
can be used in order to get all the snapshotItem
s.
Again, Array.from
can be used to achieve this easily:
Array.from({
length: snapshotCOVID19.snapshotLength
}, (_, index) => snapshotCOVID19.snapshotItem(index));
The only difference to the iterator approach is that the XPathResult
does not change when the underlying document is mutated.
To check the existence of a processing instruction, simply check the Array’s length
or if snapshotCOVID19.snapshotItem(0)
returns a Node
.
Full code
This code snippet demonstrates in full how to get all processing instructions of the form <?covid19?>
and, for example, get their nodeValue
:
function* consumeDOMIterator(iterator){
const method = (iterator instanceof NodeIterator || iterator instanceof TreeWalker
? "nextNode"
: iterator instanceof XPathResult && [
XPathResult.UNORDERED_NODE_ITERATOR_TYPE,
XPathResult.ORDERED_NODE_ITERATOR_TYPE
].includes(iterator.resultType)
? "iterateNext"
: null);
if(!method){
yield* iterator[Symbol.iterator]();
return;
}
let node;
while((node = iterator[method]())){
yield node;
}
}
const xml = `<?xml version="1.0" encoding="UTF-8"?>
<root>
<?covid19 content-a?>
<?covid19 content-b?>
<?thing x?>
<child>
<?covid19 content-c?>
<?thing y?>
</child>
<covid19>
Reject this node.
</covid19>
</root>`,
theDocument = new DOMParser().parseFromString(xml, "text/xml"),
iteratorAll = theDocument
.createNodeIterator(theDocument, NodeFilter.SHOW_PROCESSING_INSTRUCTION),
iteratorCOVID19 = theDocument
.createNodeIterator(theDocument, NodeFilter.SHOW_PROCESSING_INSTRUCTION, {
acceptNode(node){
if(node.nodeName.toLowerCase() === "covid19"){
return NodeFilter.FILTER_ACCEPT;
}
return NodeFilter.FILTER_SKIP;
}
}),
xPathAll = theDocument
.evaluate("//processing-instruction()", theDocument, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE),
xPathCOVID19 = theDocument
.evaluate("//processing-instruction('covid19')", theDocument, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE),
snapshotAll = theDocument
.evaluate("//processing-instruction()", theDocument, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE),
snapshotCOVID19 = theDocument
.evaluate("//processing-instruction('covid19')", theDocument, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE);
const demoIterator = (iterator) => Array.from(consumeDOMIterator(iterator), ({ nodeName, nodeValue }) => [
nodeName,
nodeValue
]);
const demoIterator2 = (iterator) => Array.from(consumeDOMIterator(iterator), ({ nodeValue }) => nodeValue);
console.log("All PIs using NodeIterator", demoIterator(iteratorAll));
console.log("All PIs using XPath Iterator", demoIterator(xPathAll));
console.log("All PIs using XPath Snapshot", Array.from({
length: snapshotAll.snapshotLength
}, (_, index) => {
const {
nodeName,
nodeValue
} = snapshotAll.snapshotItem(index);
return [
nodeName,
nodeValue
];
}));
console.log("All node values of <?covid19?> PIs using NodeIterator", demoIterator2(iteratorCOVID19));
console.log("All node values of <?covid19?> PIs using XPath Iterator", demoIterator2(xPathCOVID19));
console.log("All node values of <?covid19?> PIs using XPath Snapshot", Array.from({
length: snapshotCOVID19.snapshotLength
}, (_, index) => snapshotCOVID19.snapshotItem(index).nodeValue));
.as-console-wrapper { max-height: 100% !important; top: 0; }
CodePudding user response:
Seems like a typical use case for XPath
XPath allows you to query XML in a very flexible way.
This tutorial could help: