I have an XML file that contains the following content.
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article>
<article
xmlns="http://docbook.org/ns/docbook" version="5.0"
xmlns:xlink="http://www.w3.org/1999/xlink" >
<para>
This is an <emphasis role="strong">test</emphasis> sentence.
</para>
</article>
When I use
$xml_data = simplexml_load_string($filedata);
foreach ($xml_data['para'] as $data) {
echo $data;
}
I got This is an sentence.
. But I want to get This is an <b>test<b> sentence.
as result.
CodePudding user response:
Instead of simplexml_load_string
I'd recommend DOMDocument
, but that is just a personal preference. A naïve implementation might just do a string replacement and that might totally work for you. However, since you've provided actual XML that even includes a NS I'm going to try to keep this as XML-centric as possible, while skipping XPath which could possibly be used, too.
This code loads the XML and walks every node. If it find a <para>
element it walks all of the children of that node looking for an <emphasis>
node, and if it finds one it replaces it with a new new that is a <b>
node.
The replacement process is a little complex, however, because if we just use nodeValue
we might lose any HTML that lives in there, so we need to walk the children of the <emphasis>
node and clone those into our replacement node.
Because the source document has a NS, however, we also need to remove that from our final HTML. Since we are going from XML to HTML, I think that is a safe usage of a str_replace
without going to crazy in the XML land for that.
The code should have enough comments to make sense, hopefully.
<?php
$filedata = <<<EOT
<?xml version="1.0" encoding="utf-8" ?>
<article
xmlns="http://docbook.org/ns/docbook" version="5.0"
xmlns:xlink="http://www.w3.org/1999/xlink" >
<para>
This is an <emphasis role="strong">hello <em>world</em></emphasis> sentence.
</para>
</article>
EOT;
$dom = new DOMDocument();
$dom->loadXML($filedata);
foreach($dom->documentElement->childNodes as $node){
if(XML_ELEMENT_NODE === $node->nodeType && 'para' === $node->nodeName){
// Replace any emphasis elements
foreach($node->childNodes as $childNode) {
if(XML_ELEMENT_NODE === $childNode->nodeType && 'emphasis' === $childNode->nodeName){
// This is arguably the most "correct" way to replace, just in case
// there's extra nodes inside. A cheaper way would be to not loop
// and just use the nodeValue however you might lose some HTML.
$newNode = $dom->createElement('b');
foreach($childNode->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$childNode->replaceWith($newNode);
}
}
// Build our output
$output = '';
foreach($node->childNodes as $childNode) {
$output .= $dom->saveHTML($childNode);
}
// The provided XML has a namespace, and when cloning nodes that NS comes
// along. Since we are going from regular XML to irregular HTML I think
// a string replacement is best.
$output = str_replace(' xmlns="http://docbook.org/ns/docbook"', '', $output);
echo $output;
}
}
Demo here: https://3v4l.org/04Tc3#v8.0.23
NOTE: PHP 8 added replaceWith
. If you are using PHP 7 or less you'd use replaceChild
and just play around with things a bit.
CodePudding user response:
What if you have the following XML?
<entry>
<para>This is the first text</para>
<emphasis>This is the second text</emphasis>
<para>This is the <emphasis>next</emphasis> text</para>
<itemizedlist>
<listitem>
<para>
This is an paragraph inside a list
</para>
</listitem>
<itemizedlist>
<listitem>
<para>
This is an paragraph inside a list inside a list
</para>
</listitem>
</itemizedlist>
</itemizedlist>
</entry>
using
if(XML_ELEMENT_NODE === $stuff2->nodeType && 'para' === $stuff2->nodeName){
$newNode = $dom->createElement('p');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
if (XML_ELEMENT_NODE === $stuff2->nodeType && 'itemizedlist' === $stuff2->nodeName) {
$newNode = $dom->createElement('ul');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
if(XML_ELEMENT_NODE === $stuff2->nodeType && 'emphasis' === $stuff2->nodeName){
$newNode = $dom->createElement('b');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
if (XML_ELEMENT_NODE === $stuff2->nodeType && 'listitem' === $stuff2->nodeName) {
$newNode = $dom->createElement('li');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
only results in
<p>This is the first text</p>
<emphasis>This is the second text</emphasis>
<para>This is the <emphasis>next</emphasis> text</para>
<itemizedlist>
<listitem>
<para>This is an paragraph inside a list</para>
</listitem>
<itemizedlist>
<listitem>
<para>This is an paragraph inside a list inside a list</para>
</listitem>
</itemizedlist>
</itemizedlist>