Home > Software engineering >  PHP DOM Why does removing a child node of an element with removeChild interrupt a foreach loop over
PHP DOM Why does removing a child node of an element with removeChild interrupt a foreach loop over

Time:03-15

I have encountered a puzzling behavior of the DOM method removeChild. When looping over the child nodes of a DOMElement, removing one of these nodes along the way interrupts the loop, i.e., the loop does not iterate over the remaining child nodes.

Here is a minimal example:

$test_string = <<<XML
<test>
<text>A sample text with <i>mixed content</i> of <b>various sorts</b></text>
</test>
XML;

$test_DOMDocument = new DOMDocument();
$test_DOMDocument->loadXML($test_string);
$test_DOMNode = $test_DOMDocument->getElementsByTagName("text");

foreach ($test_DOMNode as $text) {
  foreach ($text->childNodes as $node) {
    if (preg_match("/text/", $node->nodeValue)) {
      echo $node->nodeValue;
      $node->parentNode->removeChild($node);
    } else {
      echo $node->nodeValue;
    }
  }
}

If I comment out the line $node->parentNode->removeChild($node);, then the output is the entire test string, i.e., A sample text with mixed content of various sorts, as expected. With that line, however, only the first child node is output, i.e., A sample text with. That is, removing the first child node as the loop passes over it apparently interrupts the loop; the remaining child nodes are not processed. Why is that?

Thanks in advance for your help!

CodePudding user response:

Implementing the suggestions of the comments on my question, I came up with the following solution:

$test_string = <<<XML
<test>
<text>A sample text with <i>mixed content</i> of <b>various sorts</b></text>
</test>
XML;

$test_DOMDocument = new DOMDocument();
$test_DOMDocument->loadXML($test_string);
$test_DOMNode = $test_DOMDocument->getElementsByTagName("text");

foreach ($test_DOMNode as $text) {
  $child_nodes = $text->childNodes;
  for($n = $child_nodes->length-1; $n >= 0; --$n) {
    $node = $child_nodes->item($n);
    if (preg_match("/text/", $node->nodeValue)) {
      echo $node->nodeValue;
      $node->parentNode->removeChild($node);
    } else {
      echo $node->nodeValue;
    }
  }
}

That is, I go through the child nodes in reverse order, using a method suggested in another posting. In this way, all nodes are processed: The output is various sorts of mixed contentA sample text with. Note the reverse order of the text fragments. In my specific use case, this reversal does not matter because I am not actually echoing the text nodes, but performing another kind of operation on them.

  • Related