Merge dictionaries with same key from two lists of dicts in python-CodePudding

I have two lists containing dictionaries as below. I am trying to merge those two into one based on the same key-value pair.

{
   "name":"harry",
   "properties":[
      {
         "id":"N3",
         "status":"OPEN",
         "type":"energetic"
      },
      {
         "id":"N5",
         "status":"OPEN",
         "type":"hot"
      }
   ]
}

and the other list:

{
   "name":"harry",
   "properties":[
      {
         "id":"N3",
         "type":"energetic",
         "language": "english"
      },
      {
         "id":"N6",
         "status":"OPEN",
         "type":"cool"
      }
   ]
}

The output I am trying to achieve is:

   "name":"harry",
   "properties":[
      {
         "id":"N3",
         "status":"OPEN",
         "type":"energetic",
         "language": "english"
      },
      {
         "id":"N5",
         "status":"OPEN",
         "type":"hot"
      },
      {
         "id":"N6",
         "status":"OPEN",
         "type":"cool"
      }
   ]
}

Since id: N3 is common in both the lists, so those 2 dicts should be merged with all the fields. So far I have tried using itertools and

ds = [d1, d2]
d = {}
for k in d1.keys():
  d[k] = tuple(d[k] for d in ds)

Could someone please help in figuring this out?

CodePudding user response：

It might help to treat the two objects as elements each in their own lists. Maybe you have other objects with different name values.

Then you could do a left outer join on both name and id keys:

#!/usr/bin/env python

a = [
    {
        "name": "harry",
        "properties": [
            {
                "id":"N3",
                "status":"OPEN",
                "type":"energetic"
            },
            {
                "id":"N5",
                "status":"OPEN",
                "type":"hot"
            }
        ]
    }
]

b = [
    {
        "name": "harry",
        "properties": [
            {
                "id":"N3",
                "type":"energetic",
                "language": "english"
            },
            {
                "id":"N6",
                "status":"OPEN",
                "type":"cool"
            }
        ]
    }
]

a_names = set()
a_prop_ids_by_name = {}
a_by_name = {}
for ao in a:
    an = ao['name']
    a_names.add(an)
    if an not in a_prop_ids_by_name:
        a_prop_ids_by_name[an] = set()
    for ap in ao['properties']:
        api = ap['id']
        a_prop_ids_by_name[an].add(api)
    a_by_name[an] = ao

res = []

for bo in b:
    bn = bo['name']
    if bn not in a_names:
        res.append(bo)
    else:
        ao = a_by_name[bn]
        bp = bo['properties']
        for bpo in bp:
             if bpo['id'] not in a_prop_ids_by_name[bn]:
                 ao['properties'].append(bpo)
        res.append(ao)

print(res)

The idea above is to process list a for names and ids. The names and ids-by-name are instances of a Python set. So members are always unique.

Once you have these sets, you can do the left outer join on the contents of list b.

Either there's an object in b that doesn't exist in a (i.e. shares a common name), in which case you add that object to the result as-is. But if there is an object in b that does exist in a (which shares a common name), then you iterate over that object's id values and look for ids not already in the a ids-by-name set. You add missing properties to a, and then add that processed object to the result.

Output:

[{'name': 'harry', 'properties': [{'id': 'N3', 'status': 'OPEN', 'type': 'energetic'}, {'id': 'N5', 'status': 'OPEN', 'type': 'hot'}, {'id': 'N6', 'status': 'OPEN', 'type': 'cool'}]}]

This doesn't do any error checking on input. This relies on name values being unique per object. So if you have duplicate keys in objects in both lists, you may get garbage (incorrect or unexpected output).

CodePudding user response：

Here is one of the approach:

a = {
   "name":"harry",
   "properties":[
      {
         "id":"N3",
         "status":"OPEN",
         "type":"energetic"
      },
      {
         "id":"N5",
         "status":"OPEN",
         "type":"hot"
      }
   ]
}
b = {
   "name":"harry",
   "properties":[
      {
         "id":"N3",
         "type":"energetic",
         "language": "english"
      },
      {
         "id":"N6",
         "status":"OPEN",
         "type":"cool"
      }
   ]
}

# Create dic maintaining the index of each id in resp dict
a_ids = {item['id']: index for index,item in enumerate(a['properties'])} #{'N3': 0, 'N5': 1}
b_ids = {item['id']: index for index,item in enumerate(b['properties'])} #{'N3': 0, 'N6': 1}

# Loop through one of the dict created
for id in a_ids.keys():
    # If same ID exists in another dict, update it with the key value
    if id in b_ids:
        b['properties'][b_ids[id]].update(a['properties'][a_ids[id]])
    # If it does not exist, then just append the new dict
    else:
        b['properties'].append(a['properties'][a_ids[id]])
        
        
print (b)

Output:

{'name': 'harry', 'properties': [{'id': 'N3', 'type': 'energetic', 'language': 'english', 'status': 'OPEN'}, {'id': 'N6', 'status': 'OPEN', 'type': 'cool'}, {'id': 'N5', 'status': 'OPEN', 'type': 'hot'}]}