I have an xml file where the contents are in tree shape. like below
<?xml version="1.0" encoding="UTF-8"?> <fee_config> <fees member_group="00400F" mail_retail="MAIL"> <admin_fee>0.76</admin_fee> <processing_fee>1.83</processing_fee> </fees> <fees member_group="00400F" mail_retail="RETAIL"> <admin_fee>1.335</admin_fee> <processing_fee>1.645</processing_fee> </fees> <fees member_group="00460G" mail_retail="MAIL"> <admin_fee>0.88</admin_fee> <processing_fee>1.18</processing_fee> </fees>
What are the various ways I can convert this to a simple dictionary in python?
CodePudding user response:
There is no one-true-mapping from an XML to a Python dict; one is a node tree, the other is a hash map, it's just an "apples and something-else comparison", so you'll have to make design decisions for yourself, considering what you want.
The link by Sreehari has a solution that does a decent job of converting an lxml node to a Python dict, but:
- it requires lxml, which is fine, but I like standard modules when they do the job
- it doesn't capture attributes
I've taken that code and converted it work with Python's standard xml.ElementTree module/class, and it handles attributes in its own way.
When I run this code against your sample, I get the following dict:
{'fees': [{'@attribs': {'mail_retail': 'MAIL', 'member_group': '00400F'},
'admin_fee': '0.76',
'processing_fee': '1.83'},
{'@attribs': {'mail_retail': 'RETAIL', 'member_group': '00400F'},
'admin_fee': '1.335',
'processing_fee': '1.645'},
{'@attribs': {'mail_retail': 'MAIL', 'member_group': '00460G'},
'admin_fee': '0.88',
'processing_fee': '1.18'}]}
Notice the @attribs
key, that's how I decided attributes should be stored. If you need something else, you can modify it to your liking:
#!/usr/bin/env python3
from xml.etree import ElementTree as ET
from pprint import pprint
def elem2dict(node):
"""
Convert an xml.ElementTree node tree into a dict.
"""
result = {}
for element in node:
key = element.tag
if '}' in key:
# Remove namespace prefix
key = key.split('}')[1]
if node.attrib:
result['@attribs'] = dict(node.items())
# Process element as tree element if the inner XML contains non-whitespace content
if element.text and element.text.strip():
value = element.text
else:
value = elem2dict(element)
# Check if a node with this name at this depth was already found
if key in result:
if type(result[key]) is not list:
# We've seen it before, but only once, we need to convert it to a list
tempvalue = result[key].copy()
result[key] = [tempvalue, value]
else:
# We've seen it at least once, it's already a list, just append the node's inner XML
result[key].append(value)
else:
# First time we've seen it
result[key] = value
return result
root = ET.parse('input.xml').getroot()
pprint(elem2dict(root))