I have a question about the Saxon C HE version for Paython. After the successful installation I tried some examples where I executed XSLT transformations. These all worked.
However, when I parse an XML file, no DTD validation is performed during parsing and the XiIncludes are not resolved. I have tried many things, however it is not possible for me to solve this problem. I hope someone can show me and explain my error.
Attached is an example which should show an error with intent when a DTD validation is done because there is no element with the name FOU in the DTD. When I run the script then it creates a Result.xml file and both the erroneous FOU element is present and the XiInclude which is not resolved.
I am aware that it is easy to do this with lxml, however I would like to know how it works with the Saxon parser.
XML Master:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
<FOU Id="A-1">
<BAR Name="Test-Bar-1"/>
<BAR Name="Test-Bar-2"/>
<BAR Name="Test-Bar-3"/>
</FOU>
<TUTU Id="TU-1">
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Include.xml" xpointer="xpointer(/node()/node()/*)"/>
</TUTU>
</TEST>
XML Include:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
<TUTU Id="TU-1">
<TITI Name="Titi-1"/>
<TITI Name="Titi-2"/>
<TITI Name="Titi-3"/>
</TUTU>
</TEST>
DTD:
<!ELEMENT TEST (FOO , TUTU )>
<!ELEMENT FOO (BAR )>
<!ELEMENT BAR ANY>
<!ELEMENT TUTU (TITI )>
<!ELEMENT TITI ANY>
<!-- Attribute -->
<!ATTLIST TEST
>
<!ATTLIST FOO
Id ID #REQUIRED
>
<!ATTLIST BAR
Name CDATA #IMPLIED
>
<!ATTLIST TUTU
Id ID #REQUIRED
>
<!ATTLIST TITI
Name CDATA #IMPLIED
>
Python Script:
import saxonc
with saxonc.PySaxonProcessor(license=False) as proc:
print(proc.version)
xdmAtomicval = proc.make_boolean_value(False)
xsltproc = proc.new_xslt_processor()
document = proc.parse_xml(xml_file_name='Master.xml')
print(document)
xsltproc.set_source(xdm_node=document)
xsltproc.set_output_file("Result.xml")
xsltproc.compile_stylesheet(stylesheet_file="styl.xslt")
xsltproc.transform_to_file(stylesheet_file="styl.xslt")
documentRes = proc.parse_xml(xml_file_name='Result.xml')
print(documentRes)
CodePudding user response:
You should be able to set the xi
and dtd
configuration properties to "on".
proc.set_configuration_property("xi", "on")
proc.set_configuration_property("dtd", "on")
However, the only way I could get it to work was if I removed the xpointer from the xinclude. I didn't have time to research why this isn't working.
It also doesn't appear that parse_xml() does any validation or xinclude resolution, but it did happen on the transform (set dtd validation to "off" or to "recover" to get Result.xml).
Here's the modified version of your Python that I used to test...
import os
import saxonc
with saxonc.PySaxonProcessor(license=False) as proc:
print(proc.version)
proc.set_cwd(os.getcwd())
proc.set_configuration_property("xi", "on")
proc.set_configuration_property("dtd", "on")
document = proc.parse_xml(xml_file_name='Master.xml')
print(document)
xsltproc = proc.new_xslt30_processor()
xsltproc.transform_to_file(source_file="Master.xml", stylesheet_file="styl.xslt", output_file="Result.xml")
documentRes = proc.parse_xml(xml_file_name='Result.xml')
print(documentRes)
CodePudding user response:
The PyDocumentBuilder
class which is new in SaxonC 11 should be able to enable you to do DTD validation. See: https://www.saxonica.com/saxon-c/doc11/html/saxonc.html#PyDocumentBuilder
You should be able to use the method dtd_validation to set validation.
You can create a PyDocumentBuilder as follows:
proc.new_document_builder