Home > Blockchain >  Convert xml to html with emphasis in php
Convert xml to html with emphasis in php

Time:09-22

I have an XML file that contains the following content.

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article>
<article
  xmlns="http://docbook.org/ns/docbook" version="5.0"
  xmlns:xlink="http://www.w3.org/1999/xlink" >
<para>
This is an <emphasis role="strong">test</emphasis> sentence.
</para>
</article>

When I use

$xml_data = simplexml_load_string($filedata);
foreach ($xml_data['para'] as $data) {
echo $data;
}

I got This is an sentence.. But I want to get This is an <b>test<b> sentence. as result.

CodePudding user response:

Instead of simplexml_load_string I'd recommend DOMDocument, but that is just a personal preference. A naïve implementation might just do a string replacement and that might totally work for you. However, since you've provided actual XML that even includes a NS I'm going to try to keep this as XML-centric as possible, while skipping XPath which could possibly be used, too.

This code loads the XML and walks every node. If it find a <para> element it walks all of the children of that node looking for an <emphasis> node, and if it finds one it replaces it with a new new that is a <b> node.

The replacement process is a little complex, however, because if we just use nodeValue we might lose any HTML that lives in there, so we need to walk the children of the <emphasis> node and clone those into our replacement node.

Because the source document has a NS, however, we also need to remove that from our final HTML. Since we are going from XML to HTML, I think that is a safe usage of a str_replace without going to crazy in the XML land for that.

The code should have enough comments to make sense, hopefully.

<?php

$filedata = <<<EOT
<?xml version="1.0" encoding="utf-8" ?>
<article
  xmlns="http://docbook.org/ns/docbook" version="5.0"
  xmlns:xlink="http://www.w3.org/1999/xlink" >
<para>
This is an <emphasis role="strong">hello <em>world</em></emphasis> sentence.
</para>
</article>
EOT;

$dom = new DOMDocument();
$dom->loadXML($filedata);

foreach($dom->documentElement->childNodes as $node){
    if(XML_ELEMENT_NODE === $node->nodeType && 'para' === $node->nodeName){
        
        // Replace any emphasis elements
        foreach($node->childNodes as $childNode) {
            if(XML_ELEMENT_NODE === $childNode->nodeType && 'emphasis' === $childNode->nodeName){
                
                // This is arguably the most "correct" way to replace, just in case
                // there's extra nodes inside. A cheaper way would be to not loop
                // and just use the nodeValue however you might lose some HTML.
                $newNode = $dom->createElement('b');
                foreach($childNode->childNodes as $grandChild){
                    $newNode->appendChild($grandChild->cloneNode(true));
                }
                $childNode->replaceWith($newNode);
            }
        }
        
        // Build our output
        $output = '';
        foreach($node->childNodes as $childNode) {
            $output .= $dom->saveHTML($childNode);
        }
        
        // The provided XML has a namespace, and when cloning nodes that NS comes
        // along. Since we are going from regular XML to irregular HTML I think
        // a string replacement is best.
        $output = str_replace(' xmlns="http://docbook.org/ns/docbook"', '', $output);
        echo $output;
    }
}

Demo here: https://3v4l.org/04Tc3#v8.0.23

NOTE: PHP 8 added replaceWith. If you are using PHP 7 or less you'd use replaceChild and just play around with things a bit.

CodePudding user response:

What if you have the following XML?

<entry>
            <para>This is the first text</para>
            <emphasis>This is the second text</emphasis>
            <para>This is the <emphasis>next</emphasis> text</para>
            <itemizedlist>
              <listitem>
                <para>
                  This is an paragraph inside a list
                </para>
              </listitem>
              <itemizedlist>
                <listitem>
                  <para>
                    This is an paragraph inside a list inside a list
                  </para>
                </listitem>
              </itemizedlist>
            </itemizedlist>
          </entry>

using

if(XML_ELEMENT_NODE === $stuff2->nodeType && 'para' === $stuff2->nodeName){
      $newNode = $dom->createElement('p');
      foreach($stuff2->childNodes as $grandChild){
          $newNode->appendChild($grandChild->cloneNode(true));
      }
      $stuff2->replaceWith($newNode);
    }
    if (XML_ELEMENT_NODE === $stuff2->nodeType && 'itemizedlist' === $stuff2->nodeName) {
      $newNode = $dom->createElement('ul');
      foreach($stuff2->childNodes as $grandChild){
        $newNode->appendChild($grandChild->cloneNode(true));
      }
      $stuff2->replaceWith($newNode);
    }
    if(XML_ELEMENT_NODE === $stuff2->nodeType && 'emphasis' === $stuff2->nodeName){
      $newNode = $dom->createElement('b');
      foreach($stuff2->childNodes as $grandChild){
          $newNode->appendChild($grandChild->cloneNode(true));
      }
      $stuff2->replaceWith($newNode);
    }
    if (XML_ELEMENT_NODE === $stuff2->nodeType && 'listitem' === $stuff2->nodeName) {
      $newNode = $dom->createElement('li');
      foreach($stuff2->childNodes as $grandChild){
          $newNode->appendChild($grandChild->cloneNode(true));
      }
      $stuff2->replaceWith($newNode);
    }

only results in

<p>This is the first text</p>
<emphasis>This is the second text</emphasis>
<para>This is the <emphasis>next</emphasis> text</para>
<itemizedlist>
  <listitem>
    <para>This is an paragraph inside a list</para>
  </listitem>
  <itemizedlist>
    <listitem>
      <para>This is an paragraph inside a list inside a list</para>
    </listitem>
  </itemizedlist>
</itemizedlist>
  • Related