Suppose I have two lists of dictionaries, l1
and l2
.
l1 = [
{ "id": 0, "foo": 0 },
{ "id": 1, "foo": 1 },
{ "id": 2, "foo": 2 },
...
]
l2 = [
{ "id": 0, "bar": 0 },
{ "id": 1, "bar": 1 },
{ "id": 2, "bar": 2 },
...
]
Is there a Pythonic way of joining the two lists together on a key, say "id"
?
Expected output:
[
{ "id": 0, "foo": 0, "bar": 0 },
{ "id": 1, "foo": 1, "bar": 1 },
{ "id": 2, "foo": 2, "bar": 2 },
...
]
This can be achieved with comprehension, but it inefficiently runs in O(NM), and creates a duplicate key-value pair if the key of l1
and l2
are different.
[
{**d1, **d2}
for d1 in l1 for d2 in l2
if d1["id"] == d2["id"]
]
Alternatively, without considering readability, one could solve it more time-efficiently by:
# Create a mapping from the key of d1 to d1.
# This dictionary will combine the entries of d1 and d2.
d = { d1["id"]: d1 for d1 in l1 }
# Insert d2 entries into their corresponding dictionaries.
for d2 in l2:
key = d2["id"]
d[key].update({
k: v
for (k, v) in d2.items()
if k != "id"
})
# Convert the dictionary back into a list of dictionaries.
result = list(d.values())
Is there a better solution?
CodePudding user response:
"Pythonic" doesn't mean "use list comprehensions instead of for loops". For-loops are very pythonic. Just use an intermediate dict as an index. Use the .setdefault
grouping idiom. Use itertools to create convenient iterators that keep your code clean:
import itertools
index = {}
for d in itertools.chain(l1, l2):
index.setdefault(d['id'], {}).update(d)
result = list(index.values())
Potentially, you could consider using a defaultdict
instead of a plain dict with .setdefault
(in this case, I probably would since the defaultdict would just be an intermediate data structure):
import itertools
import collections
index = collections.defaultdict(dict)
for d in itertools.chain(l1, l2):
index[d["id"]].update(d)
result = list(index.values())
CodePudding user response:
I would use a dictionary as in your last suggestion. There is no need to filter out "key" as the values are identical, update
will handle this seamlessly:
d2 = {d['id']: d for d in l2}
out = [{**d1, **d2.get(d1['id'], {})} for d1 in l1]
Output:
[{'id': 0, 'foo': 0, 'bar': 0},
{'id': 1, 'foo': 1, 'bar': 1},
{'id': 2, 'foo': 2, 'bar': 2}]
To update in place:
d2 = {d['id']: d for d in l2}
for d1 in l1:
d1.update(d2.get(d1['id'], {}))