I seen some, IMHO, inconsistent behaviour in Python xmltodoct.parse function.
- When an array has only 1 element, it returns an iterator with the child-elements
- When an array has more than 1 element, it returns an iterator with the elements in OrderedDict
See the example below:
import xmltodict
if __name__ == "__main__":
xml01 = """
<A>
<C>
<D>DDDD</D>
<E>EEEE</E>
</C>
</A>
"""
xd = xmltodict.parse(xml01)
print(xd)
for x in xd['A']['C']:
print(f"xml01: {x}")
xml02 = """
<A>
<C>
<D>DDDD</D>
<E>EEEE</E>
</C>
<C>
<D>DDDD</D>
<E>EEEE</E>
</C>
</A>
"""
xd = xmltodict.parse(xml02)
for x in xd['A']['C']:
print(f"xml02: {x}")
The output is:
xml01: D
xml01: E
xml02: OrderedDict([('D', 'DDDD'), ('E', 'EEEE')])
xml02: OrderedDict([('D', 'DDDD'), ('E', 'EEEE')])
I would expect that the output of the first iterator is:
xml01: OrderedDict([('D', 'DDDD'), ('E', 'EEEE')])
Now you need to do some type checking on the returned values of the iterator do know if there's one or more elements. And with more elements you need to do a new loop.
I'm curious what Python experts are thinking of this and what their solution would be.
CodePudding user response:
You are right.
And if you are asking my opinion, then I think it should be changed such to have xml01
return a list with one child.
Though according to https://github.com/martinblech/xmltodict/issues/14, the devs are aware of this but will not fix it.
The accepted workaround is to add force_list
as a parameter to the parse, thus forcing a list of OrderedDict
for the child element.
In your case, this would look like this:
import xmltodict
if __name__ == "__main__":
xml01 = """
<A>
<C>
<D>DDDD</D>
<E>EEEE</E>
</C>
</A>
"""
xd = xmltodict.parse(xml01, force_list=set('C'))
for x in xd['A']['C']:
print(f"xml01: {x}")
xml02 = """
<A>
<C>
<D>DDDD</D>
<E>EEEE</E>
</C>
<C>
<D>DDDD</D>
<E>EEEE</E>
</C>
</A>
"""
xd = xmltodict.parse(xml02, force_list=set('C'))
for x in xd['A']['C']:
print(f"xml02: {x}")´
There is also a proposal to overwrite the dict_constructor
to use defaultdict
.
But you can browse this issue if that's the way you want to go with it.