Home > database >  Get path of tags using attribute field in XSD
Get path of tags using attribute field in XSD

Time:10-23

My current task is to get information from XSD file (type of field, name of field etc). I have XSD file looks like that:

<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSpy v2018 rel. 2 sp1 (x64) (http://www.altova.com) by test (123321) -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
    <xs:complexType name="attribute">
        <xs:annotation>
            <xs:documentation>Атрибуты ОГХ</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element name="owner_id">
                <xs:annotation>
                    <xs:documentation>Данные о балансодержателе</xs:documentation>
                </xs:annotation>
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="legal_person" type="xs:integer">
                            <xs:annotation>
                                <xs:documentation>ID балансодержателя</xs:documentation>
                            </xs:annotation>
                        </xs:element>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
            <xs:element name="snow_clean_area" type="xs:double">
                <xs:annotation>
                    <xs:documentation>Площадь вывоза снега, кв. м</xs:documentation>
                </xs:annotation>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:schema>

As we can see, there is some fields <xs:element> with other <xs:element> inside (nesting).

I need to get the names of all elements in that XSD. BUT if an element is inside another one, I need to write the name as "all_prev_names;cur_name". For XSD that I showed before, it will be:

"owner_id;legal_person"
"snow_clean_area"

For more nesting, the name must have all previous names.

I wrote that code:

        def recursive(xml, name=None):
            res = xml.find_all('xs:element')

            if res:
                for elem in res:
                    if name:
                        yield from recursive(elem, elem['name']   ';'   name)
                    else:
                        yield from recursive(elem, elem['name'])
            else:
                if name:
                    yield (name)
                else:
                    yield (xml['name'])

But there is a problem with duplicate paths. The result of that function will be:

"owner_id;legal_person"
"legal_person"
"snow_clean_area"

I need to fix that code, or get another idea, how to solve that task.

CodePudding user response:

If you want to handle any XSD whatsoever, this is going to be a very tough challenge, because there are so many different ways the XSD author can make things hard for you - type restrictions and extensions, substitution groups, named model groups and attribute groups, xsd:import, xsd:redefine, etc. On the other hand, if you only need to process one schema, then you wouldn't be doing it; so you have to decide how much variation to allow for.

Working from a compiled schema that has already been processed using a schema processor is generally going to be a lot easier than working from source XSD files, and will take care of many of the variations where the same thing can be written in different ways. For example, a compiled schema might expand a substitution group as if it were written using xsd:choice.

Given that you're in the python world, one approach would be to use the Saxon schema processor to compile the source schema into an SCM file (SCM = schema component model). The SCM file is still XML, but it's flattened and normalised and much easier for applications to extract information from.

(I don't know if xmlproc has an API that allows you to access the compiled schema - if it does, that would be another approach.)

Be aware if you're trying to generate paths such as owner_id;legal_person that a schema can be recursive and allow infinite nesting, so this approach could lead to you to attempt to generate infinite paths (which would probably fail with a stack overflow). You also need to be aware of wildcards (xs:any).

CodePudding user response:

Using xml2xpath.sh to generate an xml from the xsd and get the XPath expressions: xml2xpath.sh -a -f root -d test.xsd. Requires xmlbeans package.

The provided sample did not work out of the box but the below one did. xsd2inst utility help from xmlbeans package states

Generates a document based on the given Schema file having the given element as root. The tool makes reasonable attempts to create a valid document, but this is not always possible since, for example, there are schemas for which no valid instance document can be produced.

Given this XSD

<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XMLSpy v2018 rel. 2 sp1 (x64) (http://www.altova.com) by test (123321) -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="root">
    <xs:complexType>
        <xs:annotation>
            <xs:documentation>Атрибуты ОГХ</xs:documentation>
        </xs:annotation>
        <xs:sequence>
            <xs:element name="owner_id">
                <xs:annotation>
                    <xs:documentation>Данные о балансодержателе</xs:documentation>
                </xs:annotation>
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="legal_person" type="xs:integer">
                            <xs:annotation>
                                <xs:documentation>ID балансодержателя</xs:documentation>
                            </xs:annotation>
                        </xs:element>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
            <xs:element name="snow_clean_area" type="xs:double">
                <xs:annotation>
                    <xs:documentation>Площадь вывоза снега, кв. м</xs:documentation>
                </xs:annotation>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>
</xs:schema>

The utility will return

xml2xpath.sh -a -f root -d test.xsd 
Creating XML instance starting at element root from test.xsd

xml2xpath: find XPath expressions on /tmp/tmp.FJQYKaDZI0
================================================================================ (2021-10-22 16:39:09 -03)

   -a ; 'abs_path=1'
   -f ; 'tag1=root'
   -d
================================================================================ (2021-10-22 16:39:09 -03)

Namespaces: None
================================================================================ (2021-10-22 16:39:09 -03)

Elements to process (build xpath, add prefix) 4

XPath expressions found: 4 (absolute, unique elements, use -r to override)
================================================================================ (2021-10-22 16:39:09 -03)

/root
/root/owner_id
/root/owner_id/legal_person
/root/snow_clean_area


received EXIT, bye!
================================================================================ (2021-10-22 16:39:09 -03)

xmllint and xpath can be used to get name, type attributes also but would require more parsing

(echo "setrootns"; echo "xpath //xs:element/@*" ; echo "bye") | xmllint --shell test.xsd
/ > setrootns
/ > xpath //xs:element/@*
Object is a Node Set :
Set contains 6 nodes:
1  ATTRIBUTE name
    TEXT
      content=root
2  ATTRIBUTE name
    TEXT
      content=owner_id
3  ATTRIBUTE name
    TEXT
      content=legal_person
4  ATTRIBUTE type
    TEXT
      content=xs:integer
5  ATTRIBUTE name
    TEXT
      content=snow_clean_area
6  ATTRIBUTE type
    TEXT
      content=xs:double
/ > bye

Alternative

(echo "setrootns"; echo "cat //xs:element/@*" ; echo "bye") | xmllint --shell test.xsd
/ > setrootns
/ > cat //xs:element/@*
 -------
 name="root"
 -------
 name="owner_id"
 -------
 name="legal_person"
 -------
 type="xs:integer"
 -------
 name="snow_clean_area"
 -------
 type="xs:double"
/ > bye
  • Related