I got this list which contains multiple discs and lists.
[{'classification':
{'description': 'A registered charge',
'type': 'charge-description'},
'charge_code': 'SC3802280001',
'etag': '157167f8f780f440048f4056da17784dfafe64e5',
'delivered_on': '2015-09-04',
'persons_entitled': [{'name': 'The Royal Bank of Scotland PLC'}],
'created_on': '2015-09-03',
'links': {'self': '/company/SC380228/charges/IKH-4F5A4YmihSPe9D8Mq-WAJDw'},
'particulars': {'floating_charge_covers_all': True,
'contains_negative_pledge': True,
'contains_floating_charge': True},
'status': 'outstanding',
'transactions': [{'links': {'filing': '/company/SC380228/filing-history/MzEzMDM4OTgxOGFkaXF6a2N4'},
'filing_type': 'create-charge-with-deed',
'delivered_on': '2015-09-04'}],
'charge_number': 1
}]
I need to unpack this to look like a basic DF. Example as follows:
Any suggestions how could I do this?
CodePudding user response:
Couldn't be simpler:
from pandas import DataFrame
_list = [{'classification':
{'description': 'A registered charge',
'type': 'charge-description'},
'charge_code': 'SC3802280001',
'etag': '157167f8f780f440048f4056da17784dfafe64e5',
'delivered_on': '2015-09-04',
'persons_entitled': [{'name': 'The Royal Bank of Scotland PLC'}],
'created_on': '2015-09-03',
'links': {'self': '/company/SC380228/charges/IKH-4F5A4YmihSPe9D8Mq-WAJDw'},
'particulars': {'floating_charge_covers_all': True,
'contains_negative_pledge': True,
'contains_floating_charge': True},
'status': 'outstanding',
'transactions': [{'links': {'filing': '/company/SC380228/filing-history/MzEzMDM4OTgxOGFkaXF6a2N4'},
'filing_type': 'create-charge-with-deed',
'delivered_on': '2015-09-04'}],
'charge_number': 1
}]
print(DataFrame(_list))
Output:
classification charge_code etag delivered_on ... particulars status transactions charge_number
0 {'description': 'A registered charge', 'type':... SC3802280001 157167f8f780f440048f4056da17784dfafe64e5 2015-09-04 ... {'floating_charge_covers_all': True, 'contains... outstanding [{'links': {'filing': '/company/SC380228/filin... 1
[1 rows x 11 columns]
CodePudding user response:
So the problem you face here is that your data is messy--specifically, it has many levels to it, so it doesn't fit particularly well into a 2D dataframe.
Consider, for example, what that dictionary looks like if you break it out into all its constituent levels:
data = [
{
'classification': {
'description': 'A registered charge',
'type': 'charge-description',
},
'charge_code': 'SC3802280001',
'etag': '157167f8f780f440048f4056da17784dfafe64e5',
'delivered_on': '2015-09-04',
'persons_entitled': [
{
'name': 'The Royal Bank of Scotland PLC',
}
],
'created_on': '2015-09-03',
'links': {
'self': '/company/SC380228/charges/IKH-4F5A4YmihSPe9D8Mq-WAJDw',
},
'particulars': {
'floating_charge_covers_all': True,
'contains_negative_pledge': True,
'contains_floating_charge': True,
},
'status': 'outstanding',
'transactions': [
{
'links': {
'filing': '/company/SC380228/filing-history/MzEzMDM4OTgxOGFkaXF6a2N4',
},
'filing_type': 'create-charge-with-deed',
'delivered_on': '2015-09-04',
}
],
'charge_number': 1,
}
]
There isn't an obvious column level nor an obvious values level, so you'll end up with some dataframe cells that are themselves dictionaries of key/value pairs (the values for which can also be key-value pairs!).
One thing we can do here is traverse the dictionary recursively to get to the key/value pair at the lowest level of each branch of the dictionary:
def unpack(item):
# For list items, loop over the elements and unpack them
if isinstance(item, list):
root_dict = {}
for elem in item:
root_dict.update(unpack(elem))
# For dict items, loop over the keys and values and either
# unpack them or store them
if isinstance(item, dict):
sub_dict = {}
for key, value in item.items():
# If the value needs further unpacking, unpack it and add it
# to our sub_dict
if isinstance(value, dict) or isinstance(value, list):
sub_dict.update(unpack(value))
# If we've reached the bottom level, add key/value to sub_dict
else:
sub_dict[key] = [value]
# Send the sub_dict back up through the stack (or return it and
# end execution if what we passed in didn't contain any lists)
return sub_dict
# Return the root_dict that we've been updating with our sub_dict items
# (assuming what we passed in contained lists)
return root_dict
That recursive function will return a dictionary that follows a simple one-key, one-list-value structure, i.e.:
unpacked_dict = {
'description': ['A registered charge'],
'type': ['charge-description'],
'charge_code': ['SC3802280001'],
'etag': ['157167f8f780f440048f4056da17784dfafe64e5'],
'delivered_on': ['2015-09-04'],
'name': ['The Royal Bank of Scotland PLC'],
'created_on': ['2015-09-03'],
'self': ['/company/SC380228/charges/IKH-4F5A4YmihSPe9D8Mq-WAJDw'],
'floating_charge_covers_all': [True],
'contains_negative_pledge': [True],
'contains_floating_charge': [True],
'status': ['outstanding'],
'filing': ['/company/SC380228/filing-history/MzEzMDM4OTgxOGFkaXF6a2N4'],
'filing_type': ['create-charge-with-deed'],
'delivered_on': ['2015-09-04'],
'charge_number': [1],
}
You can then pass that unpacked dictionary to Pandas with pd.DataFrame(unpacked_dict)
.