Python: Unpacking list containing multiple dictionaries-CodePudding

I got this list which contains multiple discs and lists.

[{'classification': 
 {'description': 'A registered charge', 
 'type': 'charge-description'}, 
 'charge_code': 'SC3802280001', 
 'etag': '157167f8f780f440048f4056da17784dfafe64e5', 
 'delivered_on': '2015-09-04', 
 'persons_entitled': [{'name': 'The Royal Bank of Scotland PLC'}], 
 'created_on': '2015-09-03', 
 'links': {'self': '/company/SC380228/charges/IKH-4F5A4YmihSPe9D8Mq-WAJDw'}, 
 'particulars': {'floating_charge_covers_all': True, 
 'contains_negative_pledge': True, 
 'contains_floating_charge': True}, 
 'status': 'outstanding', 
 'transactions': [{'links': {'filing': '/company/SC380228/filing-history/MzEzMDM4OTgxOGFkaXF6a2N4'}, 
 'filing_type': 'create-charge-with-deed', 
 'delivered_on': '2015-09-04'}], 
 'charge_number': 1
}]

I need to unpack this to look like a basic DF. Example as follows:

enter image description here

Any suggestions how could I do this?

CodePudding user response：

Couldn't be simpler:

from pandas import DataFrame
_list = [{'classification': 
 {'description': 'A registered charge', 
 'type': 'charge-description'}, 
 'charge_code': 'SC3802280001', 
 'etag': '157167f8f780f440048f4056da17784dfafe64e5', 
 'delivered_on': '2015-09-04', 
 'persons_entitled': [{'name': 'The Royal Bank of Scotland PLC'}], 
 'created_on': '2015-09-03', 
 'links': {'self': '/company/SC380228/charges/IKH-4F5A4YmihSPe9D8Mq-WAJDw'}, 
 'particulars': {'floating_charge_covers_all': True, 
 'contains_negative_pledge': True, 
 'contains_floating_charge': True}, 
 'status': 'outstanding', 
 'transactions': [{'links': {'filing': '/company/SC380228/filing-history/MzEzMDM4OTgxOGFkaXF6a2N4'}, 
 'filing_type': 'create-charge-with-deed', 
 'delivered_on': '2015-09-04'}], 
 'charge_number': 1
}]

print(DataFrame(_list))

Output:

                                      classification   charge_code                                      etag delivered_on  ...                                        particulars       status                                       transactions charge_number
0  {'description': 'A registered charge', 'type':...  SC3802280001  157167f8f780f440048f4056da17784dfafe64e5   2015-09-04  ...  {'floating_charge_covers_all': True, 'contains...  outstanding  [{'links': {'filing': '/company/SC380228/filin...             1

[1 rows x 11 columns]

CodePudding user response：

So the problem you face here is that your data is messy--specifically, it has many levels to it, so it doesn't fit particularly well into a 2D dataframe.

Consider, for example, what that dictionary looks like if you break it out into all its constituent levels:

data = [
    {
        'classification': {
            'description': 'A registered charge', 
            'type': 'charge-description',
        }, 
        'charge_code': 'SC3802280001', 
        'etag': '157167f8f780f440048f4056da17784dfafe64e5', 
        'delivered_on': '2015-09-04',
        'persons_entitled': [
            {
                'name': 'The Royal Bank of Scotland PLC',
            }
        ], 
        'created_on': '2015-09-03',
        'links': {
            'self': '/company/SC380228/charges/IKH-4F5A4YmihSPe9D8Mq-WAJDw',
        }, 
        'particulars': {
            'floating_charge_covers_all': True, 
            'contains_negative_pledge': True, 
            'contains_floating_charge': True,
        }, 
        'status': 'outstanding', 
        'transactions': [
            {
                'links': {
                    'filing': '/company/SC380228/filing-history/MzEzMDM4OTgxOGFkaXF6a2N4',
                },
                'filing_type': 'create-charge-with-deed', 
                'delivered_on': '2015-09-04',
            }
        ], 
        'charge_number': 1,
    }
]

There isn't an obvious column level nor an obvious values level, so you'll end up with some dataframe cells that are themselves dictionaries of key/value pairs (the values for which can also be key-value pairs!).

One thing we can do here is traverse the dictionary recursively to get to the key/value pair at the lowest level of each branch of the dictionary:

def unpack(item):

    # For list items, loop over the elements and unpack them
    if isinstance(item, list):

        root_dict = {}

        for elem in item:
            root_dict.update(unpack(elem))

    # For dict items, loop over the keys and values and either 
    # unpack them or store them
    if isinstance(item, dict):
        sub_dict = {}

        for key, value in item.items():

            # If the value needs further unpacking, unpack it and add it
            # to our sub_dict
            if isinstance(value, dict) or isinstance(value, list):
                sub_dict.update(unpack(value))

            # If we've reached the bottom level, add key/value to sub_dict
            else:
                sub_dict[key] = [value]
        
        # Send the sub_dict back up through the stack (or return it and 
        # end execution if what we passed in didn't contain any lists)
        return sub_dict
    
    # Return the root_dict that we've been updating with our sub_dict items
    # (assuming what we passed in contained lists)
    return root_dict

That recursive function will return a dictionary that follows a simple one-key, one-list-value structure, i.e.:

unpacked_dict = {
    'description': ['A registered charge'], 
    'type': ['charge-description'],
    'charge_code': ['SC3802280001'], 
    'etag': ['157167f8f780f440048f4056da17784dfafe64e5'], 
    'delivered_on': ['2015-09-04'],
    'name': ['The Royal Bank of Scotland PLC'],
    'created_on': ['2015-09-03'],
    'self': ['/company/SC380228/charges/IKH-4F5A4YmihSPe9D8Mq-WAJDw'],
    'floating_charge_covers_all': [True], 
    'contains_negative_pledge': [True], 
    'contains_floating_charge': [True],
    'status': ['outstanding'], 
    'filing': ['/company/SC380228/filing-history/MzEzMDM4OTgxOGFkaXF6a2N4'],
    'filing_type': ['create-charge-with-deed'], 
    'delivered_on': ['2015-09-04'],
    'charge_number': [1],
}

You can then pass that unpacked dictionary to Pandas with pd.DataFrame(unpacked_dict).