Home > Enterprise >  Get differences and similarities between two dicts/lists
Get differences and similarities between two dicts/lists

Time:05-03

I have two configuration files that are basically two yaml lists/dicts.

config_one: [{"ip": "0.0.0.0/24", "id": 1, "name": "First"},{"ip": "0.0.0.2/24", "id": 2, "name": "Second"},{"ip": "0.0.0.3/24", "id": 3, "name": "Third"}]
config_two: [{"ip": "0.0.0.3/24", "id": 30, "name": "Third"},{"ip": "0.0.0.0/24", "id": 1,"name": "First"}, {"ip": "0.0.0.2/24", "id": 2, "name": "Second"}]

I would like to compare these two config files with each other and write/print the similarities and differences. To make it even more fun, let's say "config_one" is "the truth" and I would like that, if there is a difference, to also print what it should be. Something in the lines of,

If there is a match:

"First config - 0.0.0.0/24 - id: 1 can be found in config_two and is in line with expected "First config - 0.0.0.0/24 - id: 1" entry found in config_one"

If there is a difference between "as is" config_two and "should be" config one:

"Third config - 0.0.0.3/24 - id: 30 can be found in config_two but is not in line with expected "Third config - 0.0.0.3/24 - id: 3" entry found in config one"

I tried playing around with some nested for loops but got stuck and was never able to truly find the way to actually address the keys and values in the second list without getting stuck in an "endless loop".

   for i in config_one:
       for j in config_two:
         if i == j:
           print: i['name']   i['ip']   i['id']   " matches "   j['name']   j['ip']   j['id']
         else:
           print i['name']   i['ip']   i['id']   " does not match, it should be "   j['name']   j['ip']   j['id'] 
   

any idea how I could tackle this?

CodePudding user response:

If the config can fit into memory, then using dictionaries makes this pretty easy. (I'm assuming that we need to match up the individual configs according to a key - e.g. "ip" or "name", etc.)

config_one = [{"ip": "0.0.0.0/24", "id": 1, "name": "First"},{"ip": "0.0.0.2/24", "id": 2, "name": "Second"},{"ip": "0.0.0.3/24", "id": 3, "name": "Third"}]
config_two = [{"ip": "0.0.0.3/24", "id": 30, "name": "Third"},{"ip": "0.0.0.0/24", "id": 1,"name": "First"}, {"ip": "0.0.0.2/24", "id": 2, "name": "Second"}]

def compare_configs(config_one, config_two, key):
    matches = []
    differences = []
    missing = []
    lookup = {item[key]: item for item in config_two}

    for item in config_one:
        if item[key] in lookup:
            is_match = item == lookup[item[key]]
            if is_match:
                matches.append(item)
            else:
                differences.append((item, lookup[item[key]]))
        else:
            missing.append(item)

    return matches, differences, missing

matches, differences, missing = compare_configs(config_one, config_two, "ip")
print(matches)
print(differences)
print(missing)

This is the result:

[{'ip': '0.0.0.0/24', 'id': 1, 'name': 'First'}, {'ip': '0.0.0.2/24', 'id': 2, 'name': 'Second'}]
[({'ip': '0.0.0.3/24', 'id': 3, 'name': 'Third'}, {'ip': '0.0.0.3/24', 'id': 30, 'name': 'Third'})]
[]

Here I create three lists, matches, differences, and missing

  • matches contains all those configs that are the same in each list
  • differences contains configs that match on a key, but some other value is different
  • missing contains configs in config_one that aren't in config_two

CodePudding user response:

You could try to convert the lists into sets which are easier to compare. The problem is that sets only take hashable objects, so you can't have a set of dictionaries. With simple dictionaries, however, you can turn them into an equivalent data structure that is hashable, e.g., a list of (key, value) tuples. E.g., the first element is equivalent to the set {('id', 2), ('ip', '0.0.0.2/24'), ('name', 'Second')}. Another hurdle in this case is that you need a set of sets for each configuration, and sets are not hashable themselves, so you need to turn the inner sets into frozensets.

With all that out of the way, you can try something like this:

>>> c1 = {frozenset(e.items()) for e in config_one}
>>> c2 = {frozenset(e.items()) for e in config_two}
>>> c1.symmetric_difference(c2)
{frozenset({('id', 3), ('ip', '0.0.0.3/24'), ('name', 'Third')}),
 frozenset({('id', 30), ('ip', '0.0.0.3/24'), ('name', 'Third')})}

  • Related