So as an example here is an MWE XML
<manifest xmlns="http://iuclid6.echa.europa.eu/namespaces/manifest/v1"
xmlns:xlink="http://www.w3.org/1999/xlink">
<general-information>
<title>IUCLID 6 container manifest file</title>
<created>Tue Nov 05 11:04:06 EET 2019</created>
<author>SuperUser</author>
</general-information>
<base-document-uuid>f53d48a9-17ef-48f0-8d0e-76d03007bdfe/f53d48a9-17ef-48f0-8d0e-76d03007bdfe</base-document-uuid>
<contained-documents>
<document id="f53d48a9-17ef-48f0-8d0e-76d03007bdfe/f53d48a9-17ef-48f0-8d0e-76d03007bdfe">
<type>DOSSIER</type>
<name xlink:type="simple"
xlink:href="f53d48a9-17ef-48f0-8d0e-76d03007bdfe_f53d48a9-17ef-48f0-8d0e-76d03007bdfe.i6d"
>Initial submission</name>
<first-modification-date>2019-03-27T06:46:39Z</first-modification-date>
<last-modification-date>2019-03-27T06:46:39Z</last-modification-date>
</document>
</contained-documents>
</manifest>
In this case I want to find an attribute xlink:href
and replace the name
tag with the contents of the file referred to by the xlink:href - in this case f53d48a9-17ef-48f0-8d0e-76d03007bdfe_f53d48a9-17ef-48f0-8d0e-76d03007bdfe.i6d (which is an XML format file as well).
At the moment I use simplexml to pull it into an object and then xml2json library to convert it into a recursive array - but walking it using the normal methods doesn't give me a way to modify a parent node..
I'm not sure how to back up the hierarchy - any suggestions??
CodePudding user response:
So this is where I am right now - xml2array (https://github.com/tamlyn/xml2json) delivers an array of arrays with XML attributes brought out into the array too
<?php
include('./xml2json.php');
$arrayData = [];
$xmlOptions = array(
"namespaceRecursive" => "True"
);
function &i6cArray(& $array){
foreach ($array as $key => $value) {
if(is_array($value)){
//recurse the array of arrays
$value = &i6cArray($value);
$array[$key]=$value;
print_r($value);
} elseif ($key == '@xlink:href') {
// we want to replace the element here with the ref'd file contents
// So we should get name.content = file contents
$tempxml = simplexml_load_file($value);
$tempArrayData = xmlToArray($tempxml);
$array['content']=$tempArrayData;
} else {
//do nothing (at least for now)
}
}
return $array;
}
if (file_exists('manifest.xml')) {
$xml = simplexml_load_file('manifest.xml');
$arrayData = xmlToArray($xml,$xmlOptions);
// walk array - we know the initial thing is an array
$arrayData = &i6cArray($arrayData);
//output result
$jsonString = json_encode($arrayData, JSON_PRETTY_PRINT);
file_put_contents('dossier.json', $jsonString);
} else {
exit("Failed to open manifest.");
}
?>
Since I would have liked to remove the @xlink attributes, but won't die otherwise I am going to insert a 'content' value which will be the referenced XML content.
I would still link to have replaced the entire 'name' key with something
CodePudding user response:
A few bits of background before we get into the specific solution:
- The parts of names before a colon are local aliases for a particular namespace, identified by a URI in an
xmlns
attribute. They need slightly different handling than non-namespaced names; see this reference question for SimpleXML. - PHP's SimpleXML and DOM extensions both have support for a language called "XPath", which lets you search for elements and attributes based on their parents and/or content.
- The DOM is a more complex API than SimpleXML, but has more powerful features, particularly for writing. You can switch between the two using the functions simplexml_import_dom() and dom_import_simplexml().
In this case, we want to find all xlink:href
attributes. Looking at the xmlns
attributes at the top of the file, we see these are in the http://www.w3.org/1999/xlink
namespace. In XPath, you can say "has an attribute" with the syntax [@attributename]
, so we can use SimpleXML and XPath like this:
$simplexml->registerXpathNamespace('xl', 'http://www.w3.org/1999/xlink');
$elements_with_xlink_hrefs = $simplexml->xpath('//[@xl:href]');
For each of those, we want the attribute value:
foreach ( $elements_with_xlink_hrefs as $simplexml_element ) {
$filename = (string)$simplexml_element->attributes('http://www.w3.org/1999/xlink')->href;
// ...
We then want to load that file, and inject it into the document; this is easier with the DOM, but there is a complexity of having to "import" the node so that it's "owned by" the right document.
// load the other file
$other_document = new DOMDocument;
$other_document->load($filename);
// switch to DOM and add it in place
$dom_element = dom_import_simplexml($simplexml_element);
$dom_element->appendChild(
$dom_element->ownerDocument->importNode(
$other_document->documentElement
)
);
We can now tidy up and delete the "xlink" attributes:
$dom_element->removeAttributeNs('http://www.w3.org/1999/xlink', 'href');
$dom_element->removeAttributeNs('http://www.w3.org/1999/xlink', 'type');
Once we're done, we can output the whole thing back as one combined XML document:
} // end of foreach loop
echo $simplexml->asXML();