Pythonic way of joining lists of dictionaries on a key-CodePudding

Suppose I have two lists of dictionaries, l1 and l2.

l1 = [
    { "id": 0, "foo": 0 },
    { "id": 1, "foo": 1 },
    { "id": 2, "foo": 2 },
    ...
]

l2 = [
    { "id": 0, "bar": 0 },
    { "id": 1, "bar": 1 },
    { "id": 2, "bar": 2 },
    ...
]

Is there a Pythonic way of joining the two lists together on a key, say "id"?

Expected output:

[
    { "id": 0, "foo": 0, "bar": 0 },
    { "id": 1, "foo": 1, "bar": 1 },
    { "id": 2, "foo": 2, "bar": 2 },
    ...
]

This can be achieved with comprehension, but it inefficiently runs in O(NM), and creates a duplicate key-value pair if the key of l1 and l2 are different.

[
    {**d1, **d2}
    for d1 in l1 for d2 in l2
    if d1["id"] == d2["id"]
]

Alternatively, without considering readability, one could solve it more time-efficiently by:

# Create a mapping from the key of d1 to d1.
# This dictionary will combine the entries of d1 and d2.
d = { d1["id"]: d1 for d1 in l1 }

# Insert d2 entries into their corresponding dictionaries.
for d2 in l2:
    key = d2["id"]
    d[key].update({
        k: v
        for (k, v) in d2.items()
        if k != "id"
    })

# Convert the dictionary back into a list of dictionaries.
result = list(d.values())

Is there a better solution?

CodePudding user response：

"Pythonic" doesn't mean "use list comprehensions instead of for loops". For-loops are very pythonic. Just use an intermediate dict as an index. Use the .setdefault grouping idiom. Use itertools to create convenient iterators that keep your code clean:

import itertools

index = {}

for d in itertools.chain(l1, l2):
    index.setdefault(d['id'], {}).update(d)

result = list(index.values())

Potentially, you could consider using a defaultdict instead of a plain dict with .setdefault (in this case, I probably would since the defaultdict would just be an intermediate data structure):

import itertools
import collections

index = collections.defaultdict(dict)

for d in itertools.chain(l1, l2):
    index[d["id"]].update(d)

result = list(index.values())

CodePudding user response：

I would use a dictionary as in your last suggestion. There is no need to filter out "key" as the values are identical, update will handle this seamlessly:

d2 = {d['id']: d for d in l2}

out = [{**d1, **d2.get(d1['id'], {})} for d1 in l1]

Output:

[{'id': 0, 'foo': 0, 'bar': 0},
 {'id': 1, 'foo': 1, 'bar': 1},
 {'id': 2, 'foo': 2, 'bar': 2}]

To update in place:

d2 = {d['id']: d for d in l2}

for d1 in l1:
    d1.update(d2.get(d1['id'], {}))