I'm trying to return the values of elements of an XML that I receive from the database
the XML in the database looks like this
<?xml version="1.0" encoding="UTF-8"?>
<record
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
xmlns="http://www.loc.gov/MARC21/slim">
<leader>00524nam a2200145Ia 4500</leader>
<controlfield tag="001">25</controlfield>
<controlfield tag="008">200930s9999 xx 000 0 und d</controlfield>
<datafield tag="090" ind1=" " ind2=" ">
<subfield code="a">25</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">20220914 frey50 </subfield>
</datafield>
<datafield tag="101" ind1=" " ind2=" ">
<subfield code="a">fre</subfield>
</datafield>
<datafield tag="200" ind1=" " ind2=" ">
<subfield code="a">Etude sur les métiers -emplois de l'environnement pour la promotion de l'emploi environnemental comme appui a l'INDH</subfield>
<subfield code="e">synthèse du rapport principal</subfield>
</datafield>
<datafield tag="210" ind1=" " ind2=" ">
<subfield code="c">DES MINES ,DE L'EAU ET DE L'ENVIRONNEMENT</subfield>
</datafield>
<datafield tag="215" ind1=" " ind2=" ">
<subfield code="a">33 p.</subfield>
</datafield>
<datafield tag="610" ind1=" " ind2=" ">
<subfield code="a">ACTEURS;ENVIRENNEMENT;EMPLOI</subfield>
</datafield>
<datafield tag="676" ind1=" " ind2=" ">
<subfield code="a">331.34</subfield>
</datafield>
</record>
to get the datafield with tag "200" and its subfield with code "a"
$xml_string = simplexml_load_string($notices->biblio->metadata[0]->metadata);
$nodes = $xml_string->xpath('//datafield[@tag="200"]/subfield[@code="a"]/text()');
I tested the XPATH reeformatter.com and it works perfectly, but when I try to return the nodes I get an empty array. I tried to remove text()
but unfortunately it didn't work as well, i tried all possibilities and nothing worked.
CodePudding user response:
Your are probably better off confronting the namespaces in your xml head on:
$xml_string->registerXPathNamespace("xxx", "http://www.loc.gov/MARC21/slim");
$node = $xml_string->xpath('//xxx:datafield[@tag="200"]/xxx:subfield[@code="a"]/text()')[0];
echo $node;
Output:
Etude sur les métiers -emplois de l'environnement pour la promotion de l'emploi environnemental comme appui a l'INDH
CodePudding user response:
You xpath was correct the problem is / was the namespace inside your xml.
Found this snippet somewhere deep in some php.net answers.
$notices->biblio->metadata[0]->metadata = str_replace('xmlns=', 'ns=', $notices->biblio->metadata[0]->metadata);
After that you can call your xpath to get the desired node:
$simplexml = simplexml_load_string($notices->biblio->metadata[0]->metadata);
$nodes = $simplexml->xpath('//datafield[@tag="200"]/subfield[@code="a"]/text()');
var_dump($nodes);die();
You might want to consider the OOP approach using SimpleXMLElement
$simplexml = New SimpleXMLElement($notices->biblio->metadata[0]->metadata);
$nodes = $simplexml->xpath('//datafield[@tag="200"]/subfield[@code="a"]/text()');
var_dump($nodes);die();
But to be honest, don't know why. Maybe some in the comments can tell me if there is any value using simple_xml_load instead of SimpleXmlElement.
CodePudding user response:
You can iterate over the elements and get the appropriate level. I don't use xpaths much so not sure what the issue there is.
$xml = new simplexmlelement('<record
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"
xmlns="http://www.loc.gov/MARC21/slim">
<leader>00524nam a2200145Ia 4500</leader>
<controlfield tag="001">25</controlfield>
<controlfield tag="008">200930s9999 xx 000 0 und d</controlfield>
<datafield tag="090" ind1=" " ind2=" ">
<subfield code="a">25</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">20220914 frey50 </subfield>
</datafield>
<datafield tag="101" ind1=" " ind2=" ">
<subfield code="a">fre</subfield>
</datafield>
<datafield tag="200" ind1=" " ind2=" ">
<subfield code="a">Etude sur les métiers -emplois de l\'environnement pour la promotion de l\'emploi environnemental comme appui a l\'INDH</subfield>
<subfield code="e">synthèse du rapport principal</subfield>
</datafield>
<datafield tag="210" ind1=" " ind2=" ">
<subfield code="c">DES MINES ,DE L\'EAU ET DE L\'ENVIRONNEMENT</subfield>
</datafield>
<datafield tag="215" ind1=" " ind2=" ">
<subfield code="a">33 p.</subfield>
</datafield>
<datafield tag="610" ind1=" " ind2=" ">
<subfield code="a">ACTEURS;ENVIRENNEMENT;EMPLOI</subfield>
</datafield>
<datafield tag="676" ind1=" " ind2=" ">
<subfield code="a">331.34</subfield>
</datafield>
</record>');
foreach ($xml->datafield as $data) {
if ($data['tag'] == 200) {
foreach ($data->subfield as $sub) {
if ($sub['code'] == "a") {
echo $sub;
}
}
}
}
CodePudding user response:
You want to have all leaf nodes with the attribute code being "a" and its parents tag attribute being 200:
//*[not(*) and @code="a" and ../@tag=200]
For that the element names (and therefore as well their namespace) do not matter:
$nodes = simplexml_load_string($notices->biblio->metadata[0]->metadata)
->xpath('//*[not(*) and @code="a" and ../@tag=200]')
;
Previously the XPath expression had an issue referencing the wrong elements by using names in the default namespace while the elements interested in were in a non-default namespace:
xmlns="http://www.loc.gov/MARC21/slim"
XPath requires you to name elements by its QName 1 for which elements not in the default namespace have a prefix for their namespace (which you need to register here as the prefix is not specified in the XML).
Additionally with a SimpleXMLElement::xpath()
expression, every text()
node match results in its parent element. Therefore you can leave it out.
This should go without saying that the XML of yours is well-fitting, so that each leaf-node represents its contents and does not require dedicated text()
node handling for which you would have needed to lean on DOMXPath
for XPath expressions then (compare dom_import_simplexml()
).