Home > Software design >  How to delete all nodes from DOMDocument except custom ones?
How to delete all nodes from DOMDocument except custom ones?

Time:11-20

I have a DOMDocument in PHP and I'm trying to delete all nodes except of a container with a specific ID.

Lets say I have the following DOM Document:

<section>
  <div id="first-section">
    <ul>
      <li>Test</li>
      <li>Test</li>
    </ul>
  </div>
  <div id="second-section">
    <ul>
      <li>Test</li>
      <li>Test</li>
    </ul>
    <div id="sub-section">
      <h2>Hello World</h2>
    </div>
  </div>
  <div id="third-section">
    <ul>
      <li>Test</li>
      <li>Test</li>
    </ul>
  </div>
</section>

My PHP Code:

$domDocument = $this->domParser->loadHTML($markup);

$xpath = new \DOMXPath($domDocument);
$nlist = $xpath->query("//*[@id='sub-section']");

$domDocument->saveHTML();

With this code I query the correct container. But how could I remove all nodes except this node from my document, so that in the end I have the following nodes:

<div id="sub-section">
    <h2>Hello World</h2>
</div>

What I tried

I tried to go the reversed way with a query like this: "/*/*[not(@id='test')]" But it works not fine for nested HTML structures. Sometimes, depending on the structure, it removes all nodes.

Whats the way to go here?

CodePudding user response:

That logic is strange. How do you know then what to keep? What in a nested case?

I would pick the ones I need and copy to a new document.

Clone a node to a new document

$xml = <<<'_XML'
<section>
  <div id="first-section">
    <ul>
      <li>Test</li>
      <li>Test</li>
    </ul>
  </div>
  <div id="second-section">
    <ul>
      <li>Test</li>
      <li>Test</li>
    </ul>
    <div id="sub-section">
      <h2>Hello World</h2>
    </div>
  </div>
  <div id="third-section">
    <ul>
      <li>Test</li>
      <li>Test</li>
    </ul>
  </div>
</section>
_XML;

libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($xml);

$newDoc = new DOMDocument();
$newDoc->appendChild($newDoc->importNode($doc->getElementById('sub-section'), true));

echo $newDoc->saveHTML();

Extract only one node

When you only need just one node, you can even easier go with

libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($xml);
echo $doc->saveHTML($doc->getElementById('sub-section'));

Output

The same output with both examples.

<div id="sub-section">
      <h2>Hello World</h2>
    </div>

Demo

https://3v4l.org/ttTS6

  • Related