Home > Software engineering >  Merging multiple dictionaries that have dictionaries in list
Merging multiple dictionaries that have dictionaries in list

Time:07-29

I have several dictionaries (perhaps 10s of them) that formed like below:

{'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
            {'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 1},
            {'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
 'stderr': ''}

I want to combine all those dictionaries with adding 'count' key's integer with same 'foo','bar' and 'host' keys (None is NoneType)

For example, for 2 dictionaries

dictA = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
            {'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
            {'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
 'stderr': ''}

dictB = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 280},
            {'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
            {'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
 'stderr': ''}

Then the merged version should be

dictMerged = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 415},
            {'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
            {'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
            {'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 4},
            {'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
            {'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 2},
            {'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1}],
 'stderr': ''}

Note that the dictionary elements in list's order changed after 'count' summed.

I have tried to combine them for same 'host' as a first step like below but it was not same as what I wanted:

hostname1 = {i["host"]: i for i in dictA['stdout']}
hostname2 = {i["host"]: i for i in dictB['stdout']}
all_host = hostname1|hostname2
{key: value   b[key] for key, value in a.items()}

CodePudding user response:

One approach

from collections import defaultdict
from operator import itemgetter

# creat a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list
groups = defaultdict(list, {(d['foo'], d['bar'], d['host']): [d] for d in dictB['stdout']})
for d in dictA["stdout"]:
    key = (d['foo'], d['bar'], d['host'])
    groups[key].append(d)

# use item getter for better readability
count = itemgetter("count")

# create new list of dictionaries, sum the count values
ds = [{'foo': f, 'bar': b, 'host': h, 'count': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]

# sort the list of dictionaries in decreasing order 
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
print(res)

Output

{'stderr': '',
 'stdout': [{'bar': 'B', 'count': 415, 'foo': 'A', 'host': None},
            {'bar': 'B', 'count': 46, 'foo': 'A', 'host': 'orange'},
            {'bar': 'B', 'count': 28, 'foo': 'C', 'host': 'egg'},
            {'bar': 'E', 'count': 4, 'foo': 'D', 'host': 'apple'},
            {'bar': 'E', 'count': 3, 'foo': 'A', 'host': 'pineapple'},
            {'bar': 'F', 'count': 2, 'foo': 'C', 'host': 'carrot'},
            {'bar': 'E', 'count': 1, 'foo': 'A', 'host': 'chicken breast'}]}

For more on each of the functions and data structures used in the code above see: sorted, defaultdict and itemgetter

One alternative

Use groupby:

import pprint
from operator import itemgetter
from itertools import groupby


def key(d):
    return d["foo"], d["bar"], d["host"] or ""


count = itemgetter("count")
lst = sorted(dictA["stdout"]   dictB["stdout"], key=key)
groups = groupby(lst, key=key)
ds = [{'foo': f, 'bar': b, 'host': h or None, 'count': sum(count(d) for d in vs)} for (f, b, h), vs in groups]
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
print(res)

This second approach has two caveats:

  1. The time complexity is O(nlogn) the first one was O(n)
  2. In order to sort the list of dictionaries it needs to replace None by the empty string "".

Multiple dictionaries

If you have multiple dictionaries you can change the first approach to:

# create a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list
groups = defaultdict(list, {(d['foo'], d['bar'], d['host']): [d] for d in dictB['stdout']})

# create a list with all the dictionaries from multiple dict
data = []
lst = [dictA]  # change this line to contain all the dictionaries except B
for d in lst:
    data.extend(d["stdout"])

for d in data:
    key = (d['foo'], d['bar'], d['host'])
    groups[key].append(d)

# use item getter for better readability
count = itemgetter("count")

# create new list of dictionaries, sum the count values
ds = [{'foo': f, 'bar': b, 'host': h, 'count': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]

# sort the list of dictionaries in decreasing order
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}

What is itemgetter?

From the documentation:

Return a callable object that fetches item from its operand using the operand’s getitem() method. If multiple items are specified, returns a tuple of lookup values.

Is equivalent to:

def itemgetter(*items):
    if len(items) == 1:
        item = items[0]
        def g(obj):
            return obj[item]
    else:
        def g(obj):
            return tuple(obj[item] for item in items)
    return g
  • Related