I have a dictionary with arbitrary key names and values that are lists corresponding to nodes in those keys:
{'foo': ["1", "2", "7"], 'bar': ["4", "8", "7"], 'baz': ["5", "6"]}
I need to consolidate this dictionary such that if ANY of the items in the list are in another list, the two lists combine into one key:
Desired Output:
{'foo': ["1", "2", "4", "7", "7", "8"], 'baz': ["5", "6"]}
I currently have the following code, but at the scale of the dictionary it will take far too long to run:
def consolidate(orig_dict):
consolidated = {}
removed = set()
for key, val in orig_dict.items():
if key in removed:
continue
consolidated.setdefault(key, val)
for item in val:
for key2, val2 in orig_dict.items():
if key == key2:
continue
if key2 in removed:
continue
if item in val2:
val.extend(val2)
removed.add(key)
removed.add(key2)
return consolidated
I am specifically looking any way to lessen time complexity, and am open to suggestions.
CodePudding user response:
Your nested loop is going through ALL the keys again. You don't need to check the preceding keys again, so you should keep track of the key position (with enumerate
) and only check the following elements. Beside you can make a set from val
to increase speed of membership testing:
def consolidate(orig_dict):
consolidated = dict()
removed = set()
for i, (key, val) in enumerate(orig_dict.items()):
if key in removed:
continue
consolidated.setdefault(key, val)
val_set = set(val)
for key2, val2 in [(k,v) for k,v in list(orig_dict.items())[i 1:] if k not in removed]:
if any(item in val_set for item in val2):
consolidated[key] = val2
removed.add(key2)
return consolidated
CodePudding user response:
You can use sets and set intersections to speed up the operations.
a = {'foo': ["1", "2", "7"], 'bar': ["4", "8", "7"], 'baz': ["5", "6"]}
consolidated = {}
joined_keys = set()
for k1, v1 in a.items():
if k1 in joined_keys:
continue
joined_keys.add(k1)
set_v1 = set(v1)
for k2, v2 in a.items():
if k2 not in joined_keys and set(v2) & set_v1:
v1.extend(v2)
joined_keys.add(k2)
consolidated[k1]=v1
print(consolidated)