How can I skip Element 'baz': No matching global declaration available for the validation root., line 1
this error in particular?
I need to validate a general set of XML/XSD pairs that are not necessarily similarly composed in any way, so hardcoded/literal rules that apply to a specific XML structure do not apply.
The XSD is being produced by GMC Inspire Designer, which is generally not an XML Validator, and is very "loose" in how it checks its syntax. The global declaration issue is occurring in my local validator, but does not occur in Inspire Designer, due to its lax nature.
How can I specify against particular error sets that will be produced by lxml
, and continue validation?
Using the following code:
#get a list of all files in the working directory that are .xml files
xml_files_from_cwd = [xml_f for xml_f in listdir(my_path) if isfile(join(my_path, xml_f))
and xml_f.lower().endswith(".xml")]
xml_validator = etree.XMLSchema(file= my_path)
for xml in xml_files_from_cwd:
recovering_parser = etree.XMLParser(recover=True)
xml_file = etree.parse(my_path "/" xml, parser=recovering_parser)
successful = False
try:
successful = xml_validator.assertValid(xml_file)
except Exception as e:
print(f"File not valid: {e}")
if successful:
print(f"No errors detected in {xml}.")
I am having issues validating an XML file where the XML looks, generally like this:
<baz>
<bar BEGIN="1">
... [repeating elements here]
</bar>
</baz>
And an XSD that follows this format:
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="foo">
<xsd:complexType>
<xsd:sequence minOccurs="1" maxOccurs="1">
<xsd:element name="bar" minOccurs="1" maxOccurs="unbounded">
.... [repeating elements here]
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
CodePudding user response:
The problem here is that validation relies on the whole document being valid.
For example, if your document would be valid for:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="foo">
<xs:complexType>
<xs:choice>
<xs:element name="bar">
<xs:complexType>
<xs:choice>
<xs:element name="baz"/>
<xs:element name="qux"/>
</xs:choice>
</xs:complexType>
</xs:element>
<xs:element name="quux">
<xs:complexType>
<xs:sequence>
<xs:element name="qux"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>
This document would be a problem:
<foo>
<quuz>
<qux/>
...
</quuz>
</foo>
Should quuz
be a bar
or a quux
?
You might be able to tell from what follows, but then you'll have to backtrack to each decision every time you run into a problem and try another decision at that point.
This gets very complicated very quickly, as something being valid may depend on its contents, its structure, its attribute values etc. Very soon, you'll have so many options to test that it becomes impossible - you can even think of situations where the number of choices is practically infinite, so you'd have to include very complicated logic to come up with a valid value.
In simple cases, like the example you showed where only the outer tag may be misnamed, you could simply fix that error in memory and retry validation. But that's not a method that scales to the whole document.
Note: in real life scenarios you may actually know and expect what's coming in and you can follow a strategy of trying validation and if it fails, repeatedly fixing the problem because you do know what the options are, until you reach the end of the document. My answer only wants to make the point that there's no general solution here.
CodePudding user response:
It would appear that the answer to this question, "Can we continue to validate a file past an initial failure condition", is no, as there's no guarantee whether any further validation would yield positive results beyond simple/trivial cases.