I need to find dictionaries in a list that have the same key-value, and create a new list in which only the first dictionary is kept.
Example list:
lst_in = [{'First': 1, 'Second': 4}, {'First': 2, 'Second': 5}, {'First': 3, 'Second': 4}]
Duplicate key-value to iterate over should be 'Second'. So in the example the first and third dictionary are the same. I have tried looking at Find duplicates in python list of dictionaries and python list of dictionaries find duplicates based on value, but I can't find the exact answer. I am only looking at one key-value. The dictionaries will always have the same keys.
Expected output:
lst_out = [{'First': 1, 'Second': 4}, {'First': 2, 'Second': 5}]
CodePudding user response:
Some solutions and a benchmark.
Solutions
Fun with a dict, going forwards to get the order and backwards to get the first value.
lst_out = list({d['Second']: d
for s in [1, -1]
for d in lst_in[::s]}.values())
Or using setdefault
to keep track of each value's first dict:
tmp = {}
for d in lst_in:
tmp.setdefault(d['Second'], d)
lst_out = list(tmp.values())
Fun and potentially faster version:
add = {}.setdefault
for d in lst_in:
add(d['Second'], d)
lst_out = list(add.__self__.values())
Benchmark
Times for a list of 1000 dicts with 100 different Second
values (using Python 3.10.0):
361 μs 362 μs 364 μs dict_forward_backward
295 μs 297 μs 297 μs dict_setdefault
231 μs 231 μs 232 μs dict_setdefault_optimized
196 μs 196 μs 197 μs set_in_list_comprehension
190 μs 190 μs 190 μs set_in_list_comprehension_optimized
191 μs 191 μs 191 μs set_in_list_comprehension_optimized_2
201 μs 201 μs 201 μs set_with_loop
1747 μs 1751 μs 1774 μs with_lists
Benchmark code:
from timeit import repeat, default_timer as timer
from random import choices
lst_in = [{'First': i, 'Second': v}
for i, v in enumerate(choices(range(100), k=1000))]
def dict_forward_backward(lst_in):
return list({d['Second']: d
for s in [1, -1]
for d in lst_in[::s]}.values())
def dict_setdefault(lst_in):
tmp = {}
for d in lst_in:
tmp.setdefault(d['Second'], d)
return list(tmp.values())
def dict_setdefault_optimized(lst_in):
add = {}.setdefault
for d in lst_in:
add(d['Second'], d)
return list(add.__self__.values())
def set_in_list_comprehension(lst_in):
return [s.add(v) or d
for s in [set()]
for d in lst_in
for v in [d['Second']]
if v not in s]
def set_in_list_comprehension_optimized(lst_in):
return [add(v) or d
for s in [set()]
for add in [s.add]
for d in lst_in
for v in [d['Second']]
if v not in s]
def set_in_list_comprehension_optimized_2(lst_in):
s = set()
add = s.add
return [add(v) or d
for d in lst_in
for v in [d['Second']]
if v not in s]
def set_with_loop(lst_in):
found = set()
lst_out = []
for dct in lst_in:
if dct['Second'] not in found:
lst_out.append(dct)
found.add( dct['Second'] )
return lst_out
def with_lists(lst_in):
out = {'keep':[], 'counter':[]}
for dct in lst_in:
if dct['Second'] not in out['counter']:
out['keep'].append(dct)
out['counter'].append(dct['Second'])
return out['keep']
funcs = [
dict_forward_backward,
dict_setdefault,
dict_setdefault_optimized,
set_in_list_comprehension,
set_in_list_comprehension_optimized,
set_in_list_comprehension_optimized_2,
set_with_loop,
with_lists,
]
# Correctness
expect = funcs[0](lst_in)
for func in funcs[1:]:
result = func(lst_in)
print(result == expect, func.__name__)
print()
# Speed
for _ in range(3):
for func in funcs:
ts = sorted(repeat(lambda: func(lst_in), 'gc.enable(); gc.collect()', number=1000))[:3]
print(*('M μs ' % (t * 1e3) for t in ts), func.__name__)
print()
CodePudding user response:
Sets are great for those "have I already seen this?" problems.
lst_in = [{'First': 1, 'Second': 4}, {'First': 2, 'Second': 5}, {'First': 3, 'Second': 4}]
found = set()
lst_out = []
for dct in lst_in:
if dct['Second'] not in found:
lst_out.append(dct)
found.add( dct['Second'] )
CodePudding user response:
You can keep a tracker for the Second
values and check for it as you add dictionaries to the final list:
lst_in = [{'First': 1, 'Second': 4}, {'First': 2, 'Second': 5}, {'First': 3, 'Second': 4}]
out = {'keep':[], 'counter':[]}
for dct in lst_in:
if dct['Second'] not in out['counter']:
out['keep'].append(dct)
out['counter'].append(dct['Second'])
print(out['keep'])
Output:
[{'First': 1, 'Second': 4}, {'First': 2, 'Second': 5}]