I have several dictionaries (perhaps 10s of them) that formed like below:
{'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
{'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 1},
{'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
'stderr': ''}
I want to combine all those dictionaries with adding 'count' key's integer with same 'foo','bar' and 'host' keys (None is NoneType)
For example, for 2 dictionaries
dictA = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 135},
{'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
{'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
'stderr': ''}
dictB = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 280},
{'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
{'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 2},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 1}],
'stderr': ''}
Then the merged version should be
dictMerged = {'stdout': [{'foo': 'A', 'bar': 'B', 'host': None, 'count': 415},
{'foo': 'A', 'bar': 'B', 'host': 'orange', 'count': 46},
{'foo': 'C', 'bar': 'B', 'host': 'egg', 'count': 28},
{'foo': 'D', 'bar': 'E', 'host': 'apple', 'count': 4},
{'foo': 'A', 'bar': 'E', 'host': 'pineapple', 'count': 3},
{'foo': 'C', 'bar': 'F', 'host': 'carrot', 'count': 2},
{'foo': 'A', 'bar': 'E', 'host': 'chicken breast', 'count': 1}],
'stderr': ''}
Note that the dictionary elements in list's order changed after 'count' summed.
I have tried to combine them for same 'host' as a first step like below but it was not same as what I wanted:
hostname1 = {i["host"]: i for i in dictA['stdout']}
hostname2 = {i["host"]: i for i in dictB['stdout']}
all_host = hostname1|hostname2
{key: value b[key] for key, value in a.items()}
CodePudding user response:
One approach
from collections import defaultdict
from operator import itemgetter
# creat a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list
groups = defaultdict(list, {(d['foo'], d['bar'], d['host']): [d] for d in dictB['stdout']})
for d in dictA["stdout"]:
key = (d['foo'], d['bar'], d['host'])
groups[key].append(d)
# use item getter for better readability
count = itemgetter("count")
# create new list of dictionaries, sum the count values
ds = [{'foo': f, 'bar': b, 'host': h, 'count': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]
# sort the list of dictionaries in decreasing order
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
print(res)
Output
{'stderr': '',
'stdout': [{'bar': 'B', 'count': 415, 'foo': 'A', 'host': None},
{'bar': 'B', 'count': 46, 'foo': 'A', 'host': 'orange'},
{'bar': 'B', 'count': 28, 'foo': 'C', 'host': 'egg'},
{'bar': 'E', 'count': 4, 'foo': 'D', 'host': 'apple'},
{'bar': 'E', 'count': 3, 'foo': 'A', 'host': 'pineapple'},
{'bar': 'F', 'count': 2, 'foo': 'C', 'host': 'carrot'},
{'bar': 'E', 'count': 1, 'foo': 'A', 'host': 'chicken breast'}]}
For more on each of the functions and data structures used in the code above see: sorted
, defaultdict
and itemgetter
One alternative
Use groupby
:
import pprint
from operator import itemgetter
from itertools import groupby
def key(d):
return d["foo"], d["bar"], d["host"] or ""
count = itemgetter("count")
lst = sorted(dictA["stdout"] dictB["stdout"], key=key)
groups = groupby(lst, key=key)
ds = [{'foo': f, 'bar': b, 'host': h or None, 'count': sum(count(d) for d in vs)} for (f, b, h), vs in groups]
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
print(res)
This second approach has two caveats:
- The time complexity is
O(nlogn)
the first one wasO(n)
- In order to sort the list of dictionaries it needs to replace
None
by the empty string""
.
Multiple dictionaries
If you have multiple dictionaries you can change the first approach to:
# create a dictionary (defaultdict) to put the dictionaries with matching foo, bar, host in the same list
groups = defaultdict(list, {(d['foo'], d['bar'], d['host']): [d] for d in dictB['stdout']})
# create a list with all the dictionaries from multiple dict
data = []
lst = [dictA] # change this line to contain all the dictionaries except B
for d in lst:
data.extend(d["stdout"])
for d in data:
key = (d['foo'], d['bar'], d['host'])
groups[key].append(d)
# use item getter for better readability
count = itemgetter("count")
# create new list of dictionaries, sum the count values
ds = [{'foo': f, 'bar': b, 'host': h, 'count': sum(count(d) for d in v)} for (f, b, h), v in groups.items()]
# sort the list of dictionaries in decreasing order
res = {"stdout": sorted(ds, key=count, reverse=True), "stderr": ""}
What is itemgetter
?
From the documentation:
Return a callable object that fetches item from its operand using the operand’s getitem() method. If multiple items are specified, returns a tuple of lookup values.
Is equivalent to:
def itemgetter(*items):
if len(items) == 1:
item = items[0]
def g(obj):
return obj[item]
else:
def g(obj):
return tuple(obj[item] for item in items)
return g