XML to Python dictionary-CodePudding

I have an xml file where the contents are in tree shape. like below

<?xml version="1.0" encoding="UTF-8"?> <fee_config> <fees member_group="00400F" mail_retail="MAIL"> <admin_fee>0.76</admin_fee> <processing_fee>1.83</processing_fee> </fees> <fees member_group="00400F" mail_retail="RETAIL"> <admin_fee>1.335</admin_fee> <processing_fee>1.645</processing_fee> </fees> <fees member_group="00460G" mail_retail="MAIL"> <admin_fee>0.88</admin_fee> <processing_fee>1.18</processing_fee> </fees>

What are the various ways I can convert this to a simple dictionary in python?

CodePudding user response：

There is no one-true-mapping from an XML to a Python dict; one is a node tree, the other is a hash map, it's just an "apples and something-else comparison", so you'll have to make design decisions for yourself, considering what you want.

The link by Sreehari has a solution that does a decent job of converting an lxml node to a Python dict, but:

it requires lxml, which is fine, but I like standard modules when they do the job
it doesn't capture attributes

I've taken that code and converted it work with Python's standard xml.ElementTree module/class, and it handles attributes in its own way.

When I run this code against your sample, I get the following dict:

{'fees': [{'@attribs': {'mail_retail': 'MAIL', 'member_group': '00400F'},
           'admin_fee': '0.76',
           'processing_fee': '1.83'},
          {'@attribs': {'mail_retail': 'RETAIL', 'member_group': '00400F'},
           'admin_fee': '1.335',
           'processing_fee': '1.645'},
          {'@attribs': {'mail_retail': 'MAIL', 'member_group': '00460G'},
           'admin_fee': '0.88',
           'processing_fee': '1.18'}]}

Notice the @attribs key, that's how I decided attributes should be stored. If you need something else, you can modify it to your liking:

#!/usr/bin/env python3
from xml.etree import ElementTree as ET
from pprint import pprint


def elem2dict(node):
    """
    Convert an xml.ElementTree node tree into a dict.
    """
    result = {}

    for element in node:
        key = element.tag
        if '}' in key:
            # Remove namespace prefix
            key = key.split('}')[1]
        
        if node.attrib:
            result['@attribs'] = dict(node.items())

        # Process element as tree element if the inner XML contains non-whitespace content
        if element.text and element.text.strip():
            value = element.text
        else:
            value = elem2dict(element)

        # Check if a node with this name at this depth was already found
        if key in result:
            if type(result[key]) is not list:
                # We've seen it before, but only once, we need to convert it to a list
                tempvalue = result[key].copy()
                result[key] = [tempvalue, value]
            else:
                # We've seen it at least once, it's already a list, just append the node's inner XML
                result[key].append(value)
        else:
            # First time we've seen it
            result[key] = value

    return result


root = ET.parse('input.xml').getroot()
pprint(elem2dict(root))