Python xmltodict shows inconsistent behaviour in XML arrays-CodePudding

I seen some, IMHO, inconsistent behaviour in Python xmltodoct.parse function.

When an array has only 1 element, it returns an iterator with the child-elements
When an array has more than 1 element, it returns an iterator with the elements in OrderedDict

See the example below:

import xmltodict

if __name__ == "__main__":
    xml01 = """
            <A>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
            </A>
    """

    xd = xmltodict.parse(xml01)
    print(xd)
    for x in xd['A']['C']:
        print(f"xml01: {x}")

    xml02 = """
            <A>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
            </A>
    """

    xd = xmltodict.parse(xml02)
    for x in xd['A']['C']:
        print(f"xml02: {x}")

The output is:

xml01: D
xml01: E
xml02: OrderedDict([('D', 'DDDD'), ('E', 'EEEE')])
xml02: OrderedDict([('D', 'DDDD'), ('E', 'EEEE')])

I would expect that the output of the first iterator is:

xml01: OrderedDict([('D', 'DDDD'), ('E', 'EEEE')])

Now you need to do some type checking on the returned values of the iterator do know if there's one or more elements. And with more elements you need to do a new loop.

I'm curious what Python experts are thinking of this and what their solution would be.

CodePudding user response：

You are right.

And if you are asking my opinion, then I think it should be changed such to have xml01 return a list with one child.

Though according to https://github.com/martinblech/xmltodict/issues/14, the devs are aware of this but will not fix it.

The accepted workaround is to add force_list as a parameter to the parse, thus forcing a list of OrderedDict for the child element.

In your case, this would look like this:

import xmltodict

if __name__ == "__main__":
    xml01 = """
            <A>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
            </A>
    """

    xd = xmltodict.parse(xml01, force_list=set('C'))
    for x in xd['A']['C']:
        print(f"xml01: {x}")

    xml02 = """
            <A>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
                    <C>
                        <D>DDDD</D>
                        <E>EEEE</E>
                    </C>
            </A>
    """

    xd = xmltodict.parse(xml02, force_list=set('C'))
    for x in xd['A']['C']:
        print(f"xml02: {x}")´

There is also a proposal to overwrite the dict_constructor to use defaultdict.

But you can browse this issue if that's the way you want to go with it.