How to deduplicate the dictionaries that contain the same id in python-CodePudding

I would like to deduplicate the dictionaries that contain the same "id" value.

list of dicts:

example = [{'term': 'potato', 'id': 10}, {'term': 'potatoes', 'id': 10}, {'term': 'apple', 'id': 7}]

Desired output:

example = [{'term': 'potato', 'id': 10}, {'term': 'apple', 'id': 7}]

For the moment I am only able to either remove all of the duplicates instead of keeping one; or only remove those dictionaries that are fully identical whereas I am only looking to deduplicate those that have the same id value.

example code (attempt):

import ast 

new_list = []
seen_keys = set()
for term in example:
    d = ast.literal_eval(term) #had to convert a string-dict to a dict first because the dictionaries were transformed to a string in a Solr database
    if d['id'] not in seen_keys:
        new_list.append(d)
        seen_keys.add(d['id'])

CodePudding user response：

Or use a one-liner list comprehension with enumerate:

>>> [d for i, d in enumerate(example) if d['id'] not in [x['id'] for x in example[i   1:]]]
[{'term': 'potatoes', 'id': 10}, {'term': 'apple', 'id': 7}]
>>>

CodePudding user response：

you can try this

example = [
    {"term": "potato", "id": 10},
    {"term": "potatoes", "id": 10},
    {"term": "apple", "id": 7},
]

ids = set()

for item in example:
    ids.add(item["id"])

results = []

for item in example:
    if item["id"] in ids:
        results.append(item)
        ids.remove(item["id"])

print(results)

CodePudding user response：

It can be done as easily as:

test_list = [{'term': 'potato', 'id': 10}, {'term': 'potatoes', 'id': 10}, {'term': 'apple', 'id': 7}]


res = []
[res.append(x) for x in test_list if x['id'] not in [y['id'] for y in res]]
print(res)

CodePudding user response：

No need to use ast.literal_eval:

example = [{'term': 'potato', 'id': 10}, {'term': 'potatoes', 'id': 10}, {'term': 'apple', 'id': 7}]

seen_keys = set()
new_list = []
for d in example:
    if d["id"] not in seen_keys:
        seen_keys.add(d["id"])
        new_list.append(d)

print(new_list)

Output

[{'term': 'potato', 'id': 10}, {'term': 'apple', 'id': 7}]

If you are interested in an O(n) one-liner, use:

new_list = list({ d["id"] : d for d in example[::-1]}.values())[::-1]
print(new_list)

Output (from one-liner)

[{'term': 'potato', 'id': 10}, {'term': 'apple', 'id': 7}]

CodePudding user response：

After slight editing of your code:

example = [{'term': 'potato', 'id': 10}, {'term': 'potatoes', 'id': 10}, {'term': 'apple', 'id': 7}]
new_list = []
seen_keys = set()

for i in example:
    if i['id'] not in seen_keys:
        new_list.append(i)
        seen_keys.add(i['id'])
        
print(new_list)

Output:

[{'term': 'potato', 'id': 10}, {'term': 'apple', 'id': 7}]

CodePudding user response：

I kinda like making a generic uniqueBy function for this sort of problem:


example = [{'term': 'potato', 'id': 10}, {'term': 'potatoes', 'id': 10}, {'term': 'apple', 'id': 7}]

def uniqueBy (f):
    return lambda a: { f(x): x for x in a }

uniqueById = uniqueBy(lambda x: x['id'])
    
print("{}".format(uniqueById(example).values()))