Merging two list of dicts with different keys effectively-CodePudding

I've got two lists:

lst1 = [{"name": "Hanna", "age":3},
        {"name": "Kris", "age": 18},
        {"name":"Dom", "age": 15},
        {"name":"Tom", "age": 5}]

and the second one contains a few of above key name values under different key:

lst2 = [{"username": "Kris", "Town": "Big City"},
        {"username":"Dom", "Town": "NYC"}]

I would like to merge them with result:

lst = [{"name": "Hanna", "age":3},
        {"name": "Kris", "age": 18, "Town": "Big City"},
        {"name":"Dom", "age": 15, "Town": "NYC"},
        {"name":"Tom", "age":"5"}]

The easiest way is to go one by one (for each element from lst1, check whether it exists in lst2), but for big lists, this is quite ineffective (my lists have a few hundred elements each). What is the most effective way to achieve this?

CodePudding user response：

To avoid iterating over another list again and again, you can build a name index first.

lst1 = [{"name": "Hanna", "age":3},
        {"name": "Kris", "age": 18},
        {"name":"Dom", "age": 15},
        {"name":"Tom", "age": 5}]
lst2 = [{"username": "Kris", "Town": "Big City"},
        {"username":"Dom", "Town": "NYC"}]

name_index = { dic['username'] : idx for idx, dic in enumerate(lst2) if dic.get('username') }

for dic in lst1:
  name = dic.get('name')
  if name in name_index:
    dic.update(lst2[name_index[name]])  # update in-place to further save time
    dic.pop('username')

print(lst1)

CodePudding user response：

One way to do this a lot more efficient than by lists is to create an intermediate dictionary from lst1 with name as key, so that you're searching a dictionary not a list.

d1 = {elem['name']: {k:v for k,v in elem.items()} for elem in lst1}

for elem in lst2:
    d1[elem['username']].update( {k:v for k,v in elem.items() if k != 'username'} )

lst = list(d1.values())

Output:

[{'name': 'Hanna', 'age': 3}, {'name': 'Kris', 'age': 18, 'Town': 'Big City'}, {'name': 'Dom', 'age': 15, 'Town': 'NYC'}, {'name': 'Tom', 'age': 5}]

edited to only have one intermediate dict

CodePudding user response：

Use zip function to pair both lists. We need to order both lists using some criteria, in this case, you must use the username and name keys for the lists because those values will be your condition to perform the updating action, for the above reason is used the sorted function with key param. It is important to sort them out to get the match.

Finally your list lst2 has a little extra procedure, I expanded it taking into account the length of lst1, that is what I do using lst2 * abs(len(lst1) - len(lst2). Theoretically, you are iterating once over an iterable zip object, therefore I consider this could be a good solution for your requirements.

for d1, d2 in zip(sorted(lst1, key=lambda d1: d1['name']),
                  sorted(lst2 * abs(len(lst1) - len(lst2)), key=lambda d2: d2['username'])):

  if d1['name'] == d2['username']:
    d1.update(d2)
    # Just we delete the username
    del d1['username']

print(lst1)

Output:

[{'name': 'Hanna', 'age': 3}, {'name': 'Kris', 'age': 18, 'Town': 'Big City'}, {'name': 'Dom', 'age': 15, 'Town': 'NYC'}, {'name': 'Tom', 'age': 5}]