python 3 - match and append dicts based on key-CodePudding

Given a very large list, called tsv_data, which resembles:

{'id':1,'name':'bob','size':2},
{'id':2,'name':'bob','size':3},
{'id':3,'name':'sarah','size':2},
{'id':4,'name':'sarah','size':2},
{'id':5,'name':'sarah','size':3},
{'id':6,'name':'sarah','size':3},
{'id':7,'name':'jack','size':5},

And a separate list of all unique strings therein, called names:

{'bob','sarah','jack'}

The aim is to produce the following data structure:

[
    {'name':'bob','children':
    [
    {'id':1,'size':2},
    {'id':2,'size':3}
    ]
    },
    {'name':'sarah','children':
    [
    {'id':3,'size':2},
    {'id':4,'size':2},
    {'id':5,'size':3},
    {'id':6,'size':3}
    ]
    },
    {'name':'jack','children':
    [
    {'id':7,'size':5}
    ]
    }
]

Which is challenging for me to write a for loop to restructure as each length is different.

Is there a python solution that is robust to length of each item of name? Please demonstrate, thanks.

CodePudding user response：

Here is a straightforward solution.

tsv_data = [
    {'id':1,'name':'bob','size':2},
    {'id':2,'name':'bob','size':3},
    {'id':3,'name':'sarah','size':2},
    {'id':4,'name':'sarah','size':2},
    {'id':5,'name':'sarah','size':3},
    {'id':6,'name':'sarah','size':3},
    {'id':7,'name':'jack','size':5}
]
names = {'bob','sarah','jack'}
expected_keys = ('id', 'size')
result = []
for name in names:
    result.append({'name': name,
                   'children': [ {k: v for k, v in d.items() if k in expected_keys}
                                     for d in tsv_data if d.get('name') == name ]})
# result:
# [{'name': 'sarah',
#  'children': [{'id': 3, 'size': 2},
#   {'id': 4, 'size': 2},
#   {'id': 5, 'size': 3},
#   {'id': 6, 'size': 3}]},
# {'name': 'bob', 'children': [{'id': 1, 'size': 2}, {'id': 2, 'size': 3}]},
# {'name': 'jack', 'children': [{'id': 7, 'size': 5}]}]

In this solution, it iterates over the whole tsv_data for each name. If the tsv_data or names is large and you want to run fast, you could create another dictionary to get a subset of tsv_data by name.