Home > Back-end >  No DTD validation and XInclude resolution when using Saxon C HE with Python
No DTD validation and XInclude resolution when using Saxon C HE with Python

Time:03-03

I have a question about the Saxon C HE version for Python. After the successful installation I tried some examples where I executed XSLT transformations. These all worked.

However, when I parse an XML file, no DTD validation is performed during parsing and the XIncludes are not resolved. I have tried many things, however it is not possible for me to solve this problem. I hope someone can show me and explain my error.

Attached is an example which should show an error with intent when a DTD validation is done because there is no element with the name FOU in the DTD. When I run the script then it creates a Result.xml file and both the erroneous FOU element is present and the XInclude which is not resolved.

I am aware that it is easy to do this with lxml, however I would like to know how it works with the Saxon parser.

XML Master:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
    <FOU Id="A-1">
        <BAR Name="Test-Bar-1"/>
        <BAR Name="Test-Bar-2"/>
        <BAR Name="Test-Bar-3"/>
    </FOU>
    <TUTU Id="TU-1">
        <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Include.xml" xpointer="xpointer(/node()/node()/*)"/>
    </TUTU>
</TEST>

XML Include:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
    <TUTU Id="TU-1">
        <TITI Name="Titi-1"/>
        <TITI Name="Titi-2"/>
        <TITI Name="Titi-3"/>
    </TUTU>
</TEST>

DTD:

<!ELEMENT TEST  (FOO  , TUTU )>
<!ELEMENT FOO   (BAR )>
<!ELEMENT BAR   ANY>
<!ELEMENT TUTU  (TITI )>
<!ELEMENT TITI  ANY>
<!-- Attribute -->
<!ATTLIST TEST
>
<!ATTLIST FOO
    Id      ID    #REQUIRED
>
<!ATTLIST BAR
    Name        CDATA #IMPLIED
>
<!ATTLIST TUTU
    Id      ID    #REQUIRED
>
<!ATTLIST TITI 
    Name        CDATA #IMPLIED
>

Python Script:

import saxonc

with saxonc.PySaxonProcessor(license=False) as proc:
    print(proc.version)
    xdmAtomicval = proc.make_boolean_value(False)
    xsltproc = proc.new_xslt_processor()
    document = proc.parse_xml(xml_file_name='Master.xml')
    print(document)
    
    xsltproc.set_source(xdm_node=document)
    xsltproc.set_output_file("Result.xml")
    xsltproc.compile_stylesheet(stylesheet_file="styl.xslt")
    xsltproc.transform_to_file(stylesheet_file="styl.xslt")
    
    documentRes = proc.parse_xml(xml_file_name='Result.xml')
    print(documentRes)

CodePudding user response:

You should be able to set the xi and dtd configuration properties to "on".

proc.set_configuration_property("xi", "on")
proc.set_configuration_property("dtd", "on")

However, the only way I could get it to work was if I removed the xpointer from the xinclude. I didn't have time to research why this isn't working.

It also doesn't appear that parse_xml() does any validation or xinclude resolution, but it did happen on the transform (set dtd validation to "off" or to "recover" to get Result.xml).

Here's the modified version of your Python that I used to test...

import os
import saxonc

with saxonc.PySaxonProcessor(license=False) as proc:
    print(proc.version)
    proc.set_cwd(os.getcwd())
    proc.set_configuration_property("xi", "on")
    proc.set_configuration_property("dtd", "on")

    document = proc.parse_xml(xml_file_name='Master.xml')
    print(document)

    xsltproc = proc.new_xslt30_processor()
    xsltproc.transform_to_file(source_file="Master.xml", stylesheet_file="styl.xslt", output_file="Result.xml")

    documentRes = proc.parse_xml(xml_file_name='Result.xml')
    print(documentRes)

CodePudding user response:

The PyDocumentBuilder class which is new in SaxonC 11 should be able to enable you to do DTD validation. See: https://www.saxonica.com/saxon-c/doc11/html/saxonc.html#PyDocumentBuilder You should be able to use the method dtd_validation to set validation.

You can create a PyDocumentBuilder as follows:

proc.new_document_builder
  • Related