Given a very large list, called tsv_data
, which resembles:
{'id':1,'name':'bob','size':2},
{'id':2,'name':'bob','size':3},
{'id':3,'name':'sarah','size':2},
{'id':4,'name':'sarah','size':2},
{'id':5,'name':'sarah','size':3},
{'id':6,'name':'sarah','size':3},
{'id':7,'name':'jack','size':5},
And a separate list of all unique strings therein, called names
:
{'bob','sarah','jack'}
The aim is to produce the following data structure:
[
{'name':'bob','children':
[
{'id':1,'size':2},
{'id':2,'size':3}
]
},
{'name':'sarah','children':
[
{'id':3,'size':2},
{'id':4,'size':2},
{'id':5,'size':3},
{'id':6,'size':3}
]
},
{'name':'jack','children':
[
{'id':7,'size':5}
]
}
]
Which is challenging for me to write a for loop to restructure as each length is different.
Is there a python solution that is robust to length of each item of name
? Please demonstrate, thanks.
CodePudding user response:
Here is a straightforward solution.
tsv_data = [
{'id':1,'name':'bob','size':2},
{'id':2,'name':'bob','size':3},
{'id':3,'name':'sarah','size':2},
{'id':4,'name':'sarah','size':2},
{'id':5,'name':'sarah','size':3},
{'id':6,'name':'sarah','size':3},
{'id':7,'name':'jack','size':5}
]
names = {'bob','sarah','jack'}
expected_keys = ('id', 'size')
result = []
for name in names:
result.append({'name': name,
'children': [ {k: v for k, v in d.items() if k in expected_keys}
for d in tsv_data if d.get('name') == name ]})
# result:
# [{'name': 'sarah',
# 'children': [{'id': 3, 'size': 2},
# {'id': 4, 'size': 2},
# {'id': 5, 'size': 3},
# {'id': 6, 'size': 3}]},
# {'name': 'bob', 'children': [{'id': 1, 'size': 2}, {'id': 2, 'size': 3}]},
# {'name': 'jack', 'children': [{'id': 7, 'size': 5}]}]
In this solution, it iterates over the whole tsv_data
for each name
. If the tsv_data
or names
is large and you want to run fast, you could create another dictionary to get a subset of tsv_data
by name
.