Home > Net >  SimpleXML get element content with xpath
SimpleXML get element content with xpath

Time:09-17

I'm trying to return the values of elements of an XML that I receive from the database

the XML in the database looks like this

<?xml version="1.0" encoding="UTF-8"?>
<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <leader>00524nam a2200145Ia 4500</leader>
  <controlfield tag="001">25</controlfield>
  <controlfield tag="008">200930s9999  xx      000 0 und d</controlfield>
  <datafield tag="090" ind1=" " ind2=" ">
    <subfield code="a">25</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">20220914              frey50        </subfield>
  </datafield>
  <datafield tag="101" ind1=" " ind2=" ">
    <subfield code="a">fre</subfield>
  </datafield>
  <datafield tag="200" ind1=" " ind2=" ">
    <subfield code="a">Etude sur les métiers -emplois de l'environnement pour la promotion de l'emploi environnemental comme appui a l'INDH</subfield>
    <subfield code="e">synthèse du rapport principal</subfield>
  </datafield>
  <datafield tag="210" ind1=" " ind2=" ">
    <subfield code="c">DES MINES ,DE L'EAU ET DE L'ENVIRONNEMENT</subfield>
  </datafield>
  <datafield tag="215" ind1=" " ind2=" ">
    <subfield code="a">33 p.</subfield>
  </datafield>
  <datafield tag="610" ind1=" " ind2=" ">
    <subfield code="a">ACTEURS;ENVIRENNEMENT;EMPLOI</subfield>
  </datafield>
  <datafield tag="676" ind1=" " ind2=" ">
    <subfield code="a">331.34</subfield>
  </datafield>
</record>

to get the datafield with tag "200" and its subfield with code "a"

$xml_string = simplexml_load_string($notices->biblio->metadata[0]->metadata);

$nodes = $xml_string->xpath('//datafield[@tag="200"]/subfield[@code="a"]/text()');

I tested the XPATH reeformatter.com and it works perfectly, but when I try to return the nodes I get an empty array. I tried to remove text() but unfortunately it didn't work as well, i tried all possibilities and nothing worked.

CodePudding user response:

Your are probably better off confronting the namespaces in your xml head on:

$xml_string->registerXPathNamespace("xxx", "http://www.loc.gov/MARC21/slim");
$node = $xml_string->xpath('//xxx:datafield[@tag="200"]/xxx:subfield[@code="a"]/text()')[0];
echo $node;

Output:

Etude sur les métiers -emplois de l'environnement pour la promotion de l'emploi environnemental comme appui a l'INDH

CodePudding user response:

You xpath was correct the problem is / was the namespace inside your xml.

Found this snippet somewhere deep in some php.net answers.

$notices->biblio->metadata[0]->metadata = str_replace('xmlns=', 'ns=', $notices->biblio->metadata[0]->metadata);

After that you can call your xpath to get the desired node:

$simplexml = simplexml_load_string($notices->biblio->metadata[0]->metadata);
$nodes = $simplexml->xpath('//datafield[@tag="200"]/subfield[@code="a"]/text()');
var_dump($nodes);die();

You might want to consider the OOP approach using SimpleXMLElement

$simplexml = New SimpleXMLElement($notices->biblio->metadata[0]->metadata);
$nodes = $simplexml->xpath('//datafield[@tag="200"]/subfield[@code="a"]/text()');
var_dump($nodes);die();

But to be honest, don't know why. Maybe some in the comments can tell me if there is any value using simple_xml_load instead of SimpleXmlElement.

CodePudding user response:

You can iterate over the elements and get the appropriate level. I don't use xpaths much so not sure what the issue there is.

$xml = new simplexmlelement('<record
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
    xmlns="http://www.loc.gov/MARC21/slim">

  <leader>00524nam a2200145Ia 4500</leader>
  <controlfield tag="001">25</controlfield>
  <controlfield tag="008">200930s9999  xx      000 0 und d</controlfield>
  <datafield tag="090" ind1=" " ind2=" ">
    <subfield code="a">25</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">20220914              frey50        </subfield>
  </datafield>
  <datafield tag="101" ind1=" " ind2=" ">
    <subfield code="a">fre</subfield>
  </datafield>
  <datafield tag="200" ind1=" " ind2=" ">
    <subfield code="a">Etude sur les métiers -emplois de l\'environnement pour la promotion de l\'emploi environnemental comme appui a l\'INDH</subfield>
    <subfield code="e">synthèse du rapport principal</subfield>
  </datafield>
  <datafield tag="210" ind1=" " ind2=" ">
    <subfield code="c">DES MINES ,DE L\'EAU ET DE L\'ENVIRONNEMENT</subfield>
  </datafield>
  <datafield tag="215" ind1=" " ind2=" ">
    <subfield code="a">33 p.</subfield>
  </datafield>
  <datafield tag="610" ind1=" " ind2=" ">
    <subfield code="a">ACTEURS;ENVIRENNEMENT;EMPLOI</subfield>
  </datafield>
  <datafield tag="676" ind1=" " ind2=" ">
    <subfield code="a">331.34</subfield>
  </datafield>
</record>');
foreach ($xml->datafield as $data) {
    if ($data['tag'] == 200) {
        foreach ($data->subfield as $sub) {
            if ($sub['code'] == "a") {
                echo $sub;
            }
        }
    }
}

CodePudding user response:

You want to have all leaf nodes with the attribute code being "a" and its parents tag attribute being 200:

//*[not(*) and @code="a" and ../@tag=200]

For that the element names (and therefore as well their namespace) do not matter:

$nodes = simplexml_load_string($notices->biblio->metadata[0]->metadata)
    ->xpath('//*[not(*) and @code="a" and ../@tag=200]')
    ;

Previously the XPath expression had an issue referencing the wrong elements by using names in the default namespace while the elements interested in were in a non-default namespace:

xmlns="http://www.loc.gov/MARC21/slim"

XPath requires you to name elements by its QName 1 for which elements not in the default namespace have a prefix for their namespace (which you need to register here as the prefix is not specified in the XML).

Additionally with a SimpleXMLElement::xpath() expression, every text() node match results in its parent element. Therefore you can leave it out.

This should go without saying that the XML of yours is well-fitting, so that each leaf-node represents its contents and does not require dedicated text() node handling for which you would have needed to lean on DOMXPath for XPath expressions then (compare dom_import_simplexml()).


1 https://stackoverflow.com/a/65163698/367456

  • Related