How to take and display the content and tags of the xml file?-CodePudding

I would like to take and display the tags and tag contents of the xml file in a table. For this, I have created a regex that allows me to do this, but it doesn't work correctly as I expected.

Here is my xml file:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
   <book id="bk104">
      <author>Corets, Eva</author>
      <title>Oberon's Legacy</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-03-10</publish_date>
      <description>In post-apocalypse England, the mysterious 
      agent known only as Oberon helps to create a new life 
      for the inhabitants of London. Sequel to Maeve 
      Ascendant.</description>
   </book>
   <book id="bk105">
      <author>Corets, Eva</author>
      <title>The Sundered Grail</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-09-10</publish_date>
      <description>The two daughters of Maeve, half-sisters, 
      battle one another for control of England. Sequel to 
      Oberon's Legacy.</description>
   </book>
   <book id="bk106">
      <author>Randall, Cynthia</author>
      <title>Lover Birds</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-09-02</publish_date>
      <description>When Carla meets Paul at an ornithology 
      conference, tempers fly as feathers get ruffled.</description>
   </book>
   <book id="bk107">
      <author>Thurman, Paula</author>
      <title>Splish Splash</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>A deep sea diver finds true love twenty 
      thousand leagues beneath the sea.</description>
   </book>
   <book id="bk108">
      <author>Knorr, Stefan</author>
      <title>Creepy Crawlies</title>
      <genre>Horror</genre>
      <price>4.95</price>
      <publish_date>2000-12-06</publish_date>
      <description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk109">
      <author>Kress, Peter</author>
      <title>Paradox Lost</title>
      <genre>Science Fiction</genre>
      <price>6.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>After an inadvertant trip through a Heisenberg
      Uncertainty Device, James Salway discovers the problems 
      of being quantum.</description>
   </book>
   <book id="bk110">
      <author>O'Brien, Tim</author>
      <title>Microsoft .NET: The Programming Bible</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-09</publish_date>
      <description>Microsoft's .NET initiative is explored in 
      detail in this deep programmer's reference.</description>
   </book>
   <book id="bk111">
      <author>O'Brien, Tim</author>
      <title>MSXML3: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-01</publish_date>
      <description>The Microsoft MSXML3 parser is covered in 
      detail, with attention to XML DOM interfaces, XSLT processing, 
      SAX and more.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>49.95</price>
      <publish_date>2001-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C  , C#, and ASP  are 
      integrated into a comprehensive development 
      environment.</description>
   </book>
</catalog>

Here is the regex I had used:

preg_match_all("|<[^>] >(.*)</[^>] >|U", $content, $matches, PREG_SET_ORDER) ;

Here is the result of this regex:

array(60) {
  [0]=>
  array(2) {
    [0]=>
    string(37) "<author>Gambardella, Matthew</author>"
    [1]=>
    string(20) "Gambardella, Matthew"
  }
  [1]=>
  array(2) {
    [0]=>
    string(36) "<title>XML Developer's Guide</title>"
    [1]=>
    string(21) "XML Developer's Guide"
  }
  [2]=>
  array(2) {
    [0]=>
    string(23) "<genre>Computer</genre>"
    [1]=>
    string(8) "Computer"
  }
  [3]=>
  array(2) {
    [0]=>
    string(20) "<price>44.95</price>"
    [1]=>
    string(5) "44.95"
  }
  [4]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-10-01</publish_date>"
    [1]=>
    string(10) "2000-10-01"
  }
  [5]=>
  array(2) {
    [0]=>
    string(27) "<author>Ralls, Kim</author>"
    [1]=>
    string(10) "Ralls, Kim"
  }
  [6]=>
  array(2) {
    [0]=>
    string(28) "<title>Midnight Rain</title>"
    [1]=>
    string(13) "Midnight Rain"
  }
  [7]=>
  array(2) {
    [0]=>
    string(22) "<genre>Fantasy</genre>"
    [1]=>
    string(7) "Fantasy"
  }
  [8]=>
  array(2) {
    [0]=>
    string(19) "<price>5.95</price>"
    [1]=>
    string(4) "5.95"
  }
  [9]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-12-16</publish_date>"
    [1]=>
    string(10) "2000-12-16"
  }
  [10]=>
  array(2) {
    [0]=>
    string(28) "<author>Corets, Eva</author>"
    [1]=>
    string(11) "Corets, Eva"
  }
  [11]=>
  array(2) {
    [0]=>
    string(30) "<title>Maeve Ascendant</title>"
    [1]=>
    string(15) "Maeve Ascendant"
  }
  [12]=>
  array(2) {
    [0]=>
    string(22) "<genre>Fantasy</genre>"
    [1]=>
    string(7) "Fantasy"
  }
  [13]=>
  array(2) {
    [0]=>
    string(19) "<price>5.95</price>"
    [1]=>
    string(4) "5.95"
  }
  [14]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-11-17</publish_date>"
    [1]=>
    string(10) "2000-11-17"
  }
  [15]=>
  array(2) {
    [0]=>
    string(28) "<author>Corets, Eva</author>"
    [1]=>
    string(11) "Corets, Eva"
  }
  [16]=>
  array(2) {
    [0]=>
    string(30) "<title>Oberon's Legacy</title>"
    [1]=>
    string(15) "Oberon's Legacy"
  }
  [17]=>
  array(2) {
    [0]=>
    string(22) "<genre>Fantasy</genre>"
    [1]=>
    string(7) "Fantasy"
  }
  [18]=>
  array(2) {
    [0]=>
    string(19) "<price>5.95</price>"
    [1]=>
    string(4) "5.95"
  }
  [19]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2001-03-10</publish_date>"
    [1]=>
    string(10) "2001-03-10"
  }
  [20]=>
  array(2) {
    [0]=>
    string(28) "<author>Corets, Eva</author>"
    [1]=>
    string(11) "Corets, Eva"
  }
  [21]=>
  array(2) {
    [0]=>
    string(33) "<title>The Sundered Grail</title>"
    [1]=>
    string(18) "The Sundered Grail"
  }
  [22]=>
  array(2) {
    [0]=>
    string(22) "<genre>Fantasy</genre>"
    [1]=>
    string(7) "Fantasy"
  }
  [23]=>
  array(2) {
    [0]=>
    string(19) "<price>5.95</price>"
    [1]=>
    string(4) "5.95"
  }
  [24]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2001-09-10</publish_date>"
    [1]=>
    string(10) "2001-09-10"
  }
  [25]=>
  array(2) {
    [0]=>
    string(33) "<author>Randall, Cynthia</author>"
    [1]=>
    string(16) "Randall, Cynthia"
  }
  [26]=>
  array(2) {
    [0]=>
    string(26) "<title>Lover Birds</title>"
    [1]=>
    string(11) "Lover Birds"
  }
  [27]=>
  array(2) {
    [0]=>
    string(22) "<genre>Romance</genre>"
    [1]=>
    string(7) "Romance"
  }
  [28]=>
  array(2) {
    [0]=>
    string(19) "<price>4.95</price>"
    [1]=>
    string(4) "4.95"
  }
  [29]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-09-02</publish_date>"
    [1]=>
    string(10) "2000-09-02"
  }
  [30]=>
  array(2) {
    [0]=>
    string(31) "<author>Thurman, Paula</author>"
    [1]=>
    string(14) "Thurman, Paula"
  }
  [31]=>
  array(2) {
    [0]=>
    string(28) "<title>Splish Splash</title>"
    [1]=>
    string(13) "Splish Splash"
  }
  [32]=>
  array(2) {
    [0]=>
    string(22) "<genre>Romance</genre>"
    [1]=>
    string(7) "Romance"
  }
  [33]=>
  array(2) {
    [0]=>
    string(19) "<price>4.95</price>"
    [1]=>
    string(4) "4.95"
  }
  [34]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-11-02</publish_date>"
    [1]=>
    string(10) "2000-11-02"
  }
  [35]=>
  array(2) {
    [0]=>
    string(30) "<author>Knorr, Stefan</author>"
    [1]=>
    string(13) "Knorr, Stefan"
  }
  [36]=>
  array(2) {
    [0]=>
    string(30) "<title>Creepy Crawlies</title>"
    [1]=>
    string(15) "Creepy Crawlies"
  }
  [37]=>
  array(2) {
    [0]=>
    string(21) "<genre>Horror</genre>"
    [1]=>
    string(6) "Horror"
  }
  [38]=>
  array(2) {
    [0]=>
    string(19) "<price>4.95</price>"
    [1]=>
    string(4) "4.95"
  }
  [39]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-12-06</publish_date>"
    [1]=>
    string(10) "2000-12-06"
  }
  [40]=>
  array(2) {
    [0]=>
    string(29) "<author>Kress, Peter</author>"
    [1]=>
    string(12) "Kress, Peter"
  }
  [41]=>
  array(2) {
    [0]=>
    string(27) "<title>Paradox Lost</title>"
    [1]=>
    string(12) "Paradox Lost"
  }
  [42]=>
  array(2) {
    [0]=>
    string(30) "<genre>Science Fiction</genre>"
    [1]=>
    string(15) "Science Fiction"
  }
  [43]=>
  array(2) {
    [0]=>
    string(19) "<price>6.95</price>"
    [1]=>
    string(4) "6.95"
  }
  [44]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-11-02</publish_date>"
    [1]=>
    string(10) "2000-11-02"
  }
  [45]=>
  array(2) {
    [0]=>
    string(29) "<author>O'Brien, Tim</author>"
    [1]=>
    string(12) "O'Brien, Tim"
  }
  [46]=>
  array(2) {
    [0]=>
    string(52) "<title>Microsoft .NET: The Programming Bible</title>"
    [1]=>
    string(37) "Microsoft .NET: The Programming Bible"
  }
  [47]=>
  array(2) {
    [0]=>
    string(23) "<genre>Computer</genre>"
    [1]=>
    string(8) "Computer"
  }
  [48]=>
  array(2) {
    [0]=>
    string(20) "<price>36.95</price>"
    [1]=>
    string(5) "36.95"
  }
  [49]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-12-09</publish_date>"
    [1]=>
    string(10) "2000-12-09"
  }
  [50]=>
  array(2) {
    [0]=>
    string(29) "<author>O'Brien, Tim</author>"
    [1]=>
    string(12) "O'Brien, Tim"
  }
  [51]=>
  array(2) {
    [0]=>
    string(44) "<title>MSXML3: A Comprehensive Guide</title>"
    [1]=>
    string(29) "MSXML3: A Comprehensive Guide"
  }
  [52]=>
  array(2) {
    [0]=>
    string(23) "<genre>Computer</genre>"
    [1]=>
    string(8) "Computer"
  }
  [53]=>
  array(2) {
    [0]=>
    string(20) "<price>36.95</price>"
    [1]=>
    string(5) "36.95"
  }
  [54]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2000-12-01</publish_date>"
    [1]=>
    string(10) "2000-12-01"
  }
  [55]=>
  array(2) {
    [0]=>
    string(28) "<author>Galos, Mike</author>"
    [1]=>
    string(11) "Galos, Mike"
  }
  [56]=>
  array(2) {
    [0]=>
    string(53) "<title>Visual Studio 7: A Comprehensive Guide</title>"
    [1]=>
    string(38) "Visual Studio 7: A Comprehensive Guide"
  }
  [57]=>
  array(2) {
    [0]=>
    string(23) "<genre>Computer</genre>"
    [1]=>
    string(8) "Computer"
  }
  [58]=>
  array(2) {
    [0]=>
    string(20) "<price>49.95</price>"
    [1]=>
    string(5) "49.95"
  }
  [59]=>
  array(2) {
    [0]=>
    string(39) "<publish_date>2001-04-16</publish_date>"
    [1]=>
    string(10) "2001-04-16"
  }
}

But the problem because of this regex, I don't have all the content of the xml file because of their attribute, I think. So, how can I get the other tags in the xml file that are not displayed because of their attribute? What should I change in the regex please?

CodePudding user response：

RegEx can be used to extract data from an XML string but it does not recognize the nodes and the hierarchy. So it is only useful for very specific cases. The RegEx will get complex really fast also.

Use an XML parser for reading or an XSLT processor for transforming. Xpath expressions allow to fetch specific nodes or values.

Here is a basic example using DOM:

// bootstrap DOM Xpath
$document = new DOMDocument();
$document->loadXML(getXMLString());
$xpath = new DOMXpath($document);

// iterate "book" elements
foreach ($xpath->evaluate('/catalog/book') as $book) {
    var_dump(
        [
            // read the "id" attribute
            'id' => $book->getAttribute('id'),
            // fetch first "title" element child as string
            'title' => $xpath->evaluate('string(title)', $book)
        ]
    );
}

function getXMLString(): string {
    return <<<'XML'
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>
XML;
}

CodePudding user response：

"The Right tool for the right job" is a commonly cited expression - a Regex to parse XML is not, in my opinion, the "Right Tool!" The task of presenting the contents of an XML file in table form can best be accomplished with

To display the tag and the content - again using XSLT the xsl file needs to be modified slightly. Within the <xsl:for-each select="*"> loop you also want to add the tagName like this perhaps:

<xsl:value-of select="name()" /> | <xsl:value-of select="text()" />

This modification yields: