Home > Mobile >  How to merge duplicate dicts in list in Python
How to merge duplicate dicts in list in Python

Time:09-02

I have below list:

lst = [
       {'name': 'bezel', 'conf': 0.67},
       {'name': 'plate', 'conf': 0.69},
       {'name': 'bezel', 'conf': 0.65},
       {'name': 'plate', 'conf': 0.46},
       {'name': 'bezel', 'conf': 0.42}
]

Above list contains duplicate dicts which has name as bezel and plate. There can be n number of names in the list. I want to remove these duplicate dicts and only keep the dict which has high conf. So the output would look like below:

lst = [
       {'name': 'bezel', 'conf': 0.67},
       {'name': 'plate', 'conf': 0.69}
]

I can use multiple for and if loops to get the output but is there any easier way of doing this? Thanks

Below is what I have done so far

newLst = []  # creating new list to save data
for item in lst:
    if any(d['name'] == item['name'] for d in newLst):  # Checking if current item exist in newLst
        idx = next((index for (index, d) in enumerate(newLst) if d["name"] == item['name']), None)  # Get the index of current item name from newLst
        if newLst[idx]['conf'] < item['conf']:  # Check if its greater conf
            del newLst[idx]
            newLst.append({'name': item['name'], 'conf': item['conf']})
    else:
        newLst.append({'name': item['name'], 'conf': item['conf']})   # If not add current item

print(newLst)

CodePudding user response:

You can do:

result=[]
for n in {d['name'] for d in lst}:
    sl=[e for e in lst if e['name']==n]
    result.append(max(sl, key=lambda x:x['conf']))

>>> result
[{'name': 'plate', 'conf': 0.69}, {'name': 'bezel', 'conf': 0.67}]
  1. {d['name'] for d in lst} creates a set of all possible 'name' keys contained in the list;

  2. sl=[e for e in lst if e['name']==n] filters for that name;

  3. max(sl, key=lambda x:x['conf']) find the max of that filter list keyed by the 'conf' key.

Profit!


If preserving order is important, use a dict rather than a set to uniquify the list:

result=[]
for n in {d['name']:None for d in lst}:
    sl=[e for e in lst if e['name']==n]
    result.append(max(sl, key=lambda x:x['conf']))

>>> result
[{'name': 'bezel', 'conf': 0.67}, {'name': 'plate', 'conf': 0.69}]

There is also the sort / uniquify method:

>>> list({d['name']:d for d in sorted(lst, key=lambda d: (d['name'], d['conf']))}.values())
[{'name': 'bezel', 'conf': 0.67}, {'name': 'plate', 'conf': 0.69}]

CodePudding user response:

You can get a compact answer by creating a dictionary where each key is one of the possible names, if the name is not in the dict, add it with its conf, else update the maximum if needed.

def filter_on_key(lst):
    tmp = {}
    for d in lst:
    if tmp.get(d['name']) is None:
        tmp[d['name']] = d['conf']
    else:
        if d['conf'] > tmp[d['name']]:
            tmp[d['name']] = d['conf']
    return tmp

Gets you:

out = filter_on_key(lst)
{'bezel': 0.67, 'plate': 0.69}

If you want to get back the original format, a comprehension works fine:

res = [{'name':k, 'conf':v} for k, v in out.items()]
[{'name': 'bezel', 'conf': 0.67}, {'name': 'plate', 'conf': 0.69}]

CodePudding user response:

You can use a dict to store input dicts by name. Then you just need to remove and replace them when their 'conf' value is higher.

accumulator = {}
for d in lst:
    key = d['name']
    if key not in accumulator:
        accumulator[key] = d
    elif d['conf'] > accumulator[key]['conf']:
        del accumulator[key]
        accumulator[key] = d
result = list(accumulator.values())

Result:

[{'name': 'bezel', 'conf': 0.67}, {'name': 'plate', 'conf': 0.69}]

Note that this is stable, i.e. the output preserves the order of the input.

  • Related