I have encountered a puzzling behavior of the DOM method removeChild
. When looping over the child nodes of a DOMElement, removing one of these nodes along the way interrupts the loop, i.e., the loop does not iterate over the remaining child nodes.
Here is a minimal example:
$test_string = <<<XML
<test>
<text>A sample text with <i>mixed content</i> of <b>various sorts</b></text>
</test>
XML;
$test_DOMDocument = new DOMDocument();
$test_DOMDocument->loadXML($test_string);
$test_DOMNode = $test_DOMDocument->getElementsByTagName("text");
foreach ($test_DOMNode as $text) {
foreach ($text->childNodes as $node) {
if (preg_match("/text/", $node->nodeValue)) {
echo $node->nodeValue;
$node->parentNode->removeChild($node);
} else {
echo $node->nodeValue;
}
}
}
If I comment out the line $node->parentNode->removeChild($node);
, then the output is the entire test string, i.e., A sample text with mixed content of various sorts
, as expected. With that line, however, only the first child node is output, i.e., A sample text with
. That is, removing the first child node as the loop passes over it apparently interrupts the loop; the remaining child nodes are not processed. Why is that?
Thanks in advance for your help!
CodePudding user response:
Implementing the suggestions of the comments on my question, I came up with the following solution:
$test_string = <<<XML
<test>
<text>A sample text with <i>mixed content</i> of <b>various sorts</b></text>
</test>
XML;
$test_DOMDocument = new DOMDocument();
$test_DOMDocument->loadXML($test_string);
$test_DOMNode = $test_DOMDocument->getElementsByTagName("text");
foreach ($test_DOMNode as $text) {
$child_nodes = $text->childNodes;
for($n = $child_nodes->length-1; $n >= 0; --$n) {
$node = $child_nodes->item($n);
if (preg_match("/text/", $node->nodeValue)) {
echo $node->nodeValue;
$node->parentNode->removeChild($node);
} else {
echo $node->nodeValue;
}
}
}
That is, I go through the child nodes in reverse order, using a method suggested in another posting. In this way, all nodes are processed: The output is various sorts of mixed contentA sample text with
. Note the reverse order of the text fragments. In my specific use case, this reversal does not matter because I am not actually echoing the text nodes, but performing another kind of operation on them.