Home > front end >  re-add CDATA when saving simpleXMLObject
re-add CDATA when saving simpleXMLObject

Time:01-13

Loading a rather large nested XML file via

$content = simplexml_load_string(
    $xmlString, null, LIBXML_NOCDATA
);

Is there a simple way to add CDATA escaping back to all nodes when exporting/saving the xml back to a string?

I thought that something like this would work.

 $xmlIterator = new RecursiveIteratorIterator(
    new SimpleXMLIterator($xml_string), 
    RecursiveIteratorIterator::SELF_FIRST
 );
 foreach ($xmlIterator as $nodeName => &$node) {
   $node->textContent = sprintf('<![CDATA[%s]]>', $node->textContent);
 }

as seen here https://stackoverflow.com/a/31983626/1468708

But then you couldn't use node via reference to update the node directly. Otherwise I would have tried to add cdata via the tree directly.

CodePudding user response:

SimpleXML tries to abstract away differences in encoding of the XML, and concentrate on the semantic data content, so it doesn't directly expose the difference between <![CDATA[foo&bar]]> and foo&amp;bar.

Luckily, PHP also has an implementation of the DOM, which does include these "lower-level" details - and you can use it interchangeably with SimpleXML using dom_import_simplexml and simplexml_import_dom, which switch between the two APIs without re-parsing the XML.

Specifically, the DOM has "node types" of XML_TEXT_NODE and XML_CDATA_SECTION_NODE, with corresponding DOMText and DOMCdataSection classes. So you need to iterate the DOM recursively, finding any XML_TEXT_NODE nodes and replacing them with a new DOMCdataSection object with the same text.

We can use the recursive iterator you have already to get all the elements in the XML, and then switch to the DOM to handle their text content:

$simplexml = new SimpleXMLIterator($xml);

$xmlIterator = new RecursiveIteratorIterator(
    $simplexml,
    RecursiveIteratorIterator::SELF_FIRST
);
foreach ($xmlIterator as $nodeName => $node) {
    $nodeAsDom = dom_import_simplexml($node);
    var_dump($nodeAsDom); // DOM logic goes here
}
 
echo $simplexml->asXML();

For each DOM node, we then loop over its children looking for Text nodes and replacing them. Note that we need the new DOMCdataSection object to be "owned by" the same document as the original, so we use a helper function rather than directly calling the constructor:

    foreach ( $nodeAsDom->childNodes as $childNode ) {
        if ( $childNode->nodeType === XML_TEXT_NODE ) {
            $cdataNode = $nodeAsDom->ownerDocument->createCdataSection($childNode->data);
            $nodeAsDom->replaceChild($cdataNode, $childNode);
        }
    }
  • Related