Home > database >  Consolidating keys in dictionary based on items in corresponding value lists
Consolidating keys in dictionary based on items in corresponding value lists

Time:12-01

I have a dictionary with arbitrary key names and values that are lists corresponding to nodes in those keys:

{'foo': ["1", "2", "7"], 'bar': ["4", "8", "7"], 'baz': ["5", "6"]}

I need to consolidate this dictionary such that if ANY of the items in the list are in another list, the two lists combine into one key:

Desired Output:

{'foo': ["1", "2", "4", "7", "7", "8"], 'baz': ["5", "6"]}

I currently have the following code, but at the scale of the dictionary it will take far too long to run:

def consolidate(orig_dict):
    consolidated = {}
    removed = set()

    for key, val in orig_dict.items():
        if key in removed:
            continue
        consolidated.setdefault(key, val)

        for item in val:
            for key2, val2 in orig_dict.items():
                if key == key2:
                    continue
                if key2 in removed:
                    continue
                if item in val2:
                    val.extend(val2)
                    removed.add(key)
                    removed.add(key2)

     return consolidated

I am specifically looking any way to lessen time complexity, and am open to suggestions.

CodePudding user response:

Your nested loop is going through ALL the keys again. You don't need to check the preceding keys again, so you should keep track of the key position (with enumerate) and only check the following elements. Beside you can make a set from val to increase speed of membership testing:

def consolidate(orig_dict):
    consolidated = dict()
    removed = set()
    for i, (key, val) in enumerate(orig_dict.items()):
        if key in removed:
            continue
        consolidated.setdefault(key, val)
        val_set = set(val)
        for key2, val2 in [(k,v) for k,v in list(orig_dict.items())[i 1:] if k not in removed]:
            if any(item in val_set for item in val2):
                consolidated[key]  = val2
                removed.add(key2)

    return consolidated

CodePudding user response:

You can use sets and set intersections to speed up the operations.

a = {'foo': ["1", "2", "7"], 'bar': ["4", "8", "7"], 'baz': ["5", "6"]}
consolidated = {}
joined_keys = set()
for k1, v1 in a.items():
  if k1 in joined_keys:
    continue
  joined_keys.add(k1)
  set_v1 = set(v1)
  
  for k2, v2 in a.items():
    if k2 not in joined_keys and set(v2) & set_v1:
      v1.extend(v2)
      joined_keys.add(k2)
  consolidated[k1]=v1
print(consolidated)
  • Related