Variable tsv_data
has the following structure:
[
{'id':1,'name':'bob','type':'blue','size':2},
{'id':2,'name':'bob','type':'blue','size':3},
{'id':3,'name':'bob','type':'blue','size':4},
{'id':4,'name':'bob','type':'red','size':2},
{'id':5,'name':'sarah','type':'blue','size':2},
{'id':6,'name':'sarah','type':'blue','size':3},
{'id':7,'name':'sarah','type':'green','size':2},
{'id':8,'name':'jack','type':'blue','size':5},
]
Which I would like to restructure into:
[
{'name':'bob', 'children':[
{'name':'blue','children':[
{'id':1, 'size':2},
{'id':2, 'size':3},
{'id':3, 'size':4}
]},
{'name':'red','children':[
{'id':4, 'size':2}
]}
]},
{'name':'sarah', 'children':[
{'name':'blue','children':[
{'id':5, 'size':2},
{'id':6, 'size':3},
]},
{'name':'green','children':[
{'id':7, 'size':2}
]}
]},
{'name':'jack', 'children':[
{'name':'blue', 'children':[
{'id':8, 'size':5}
]}
]}
]
What is obstructing my progress is not knowing how many items will be in the children list for each major category. In a similar vein, we also don't know which categories will be present. It could be blue
or green
or red
-- all three or in any combination (like only red
and green
or only green
).
Question
How might we devise a fool-proof way to compile the basic list of list contained in tsv_data
into a multi-tier hierarchical data structure as above?
CodePudding user response:
Given your major categories as a list:
categories = ['name', 'type']
You can first transform the input data into a nested dict of lists so that it's easier and more efficient to access children by keys than your desired output format, a nested list of dicts:
tree = {}
for record in tsv_data:
node = tree
for category in categories[:-1]:
node = node.setdefault(record.pop(category), {})
node.setdefault(record.pop(categories[-1]), []).append(record)
tree
would become:
{'bob': {'blue': [{'id': 1, 'size': 2}, {'id': 2, 'size': 3}, {'id': 3, 'size': 4}], 'red': [{'id': 4, 'size': 2}]}, 'sarah': {'blue': [{'id': 5, 'size': 2}, {'id': 6, 'size': 3}], 'green': [{'id': 7, 'size': 2}]}, 'jack': {'blue': [{'id': 8, 'size': 5}]}}
You can then transform the nested dict to your desired output structure with a recursive function:
def transform(node):
if isinstance(node, dict):
return [
{'name': name, 'children': transform(child)}
for name, child in node.items()
]
return node
so that transform(tree)
would return:
[{'name': 'bob', 'children': [{'name': 'blue', 'children': [{'id': 1, 'size': 2}, {'id': 2, 'size': 3}, {'id': 3, 'size': 4}]}, {'name': 'red', 'children': [{'id': 4, 'size': 2}]}]}, {'name': 'sarah', 'children': [{'name': 'blue', 'children': [{'id': 5, 'size': 2}, {'id': 6, 'size': 3}]}, {'name': 'green', 'children': [{'id': 7, 'size': 2}]}]}, {'name': 'jack', 'children': [{'name': 'blue', 'children': [{'id': 8, 'size': 5}]}]}]
Demo: https://replit.com/@blhsing/NotableCourageousTranslations