In order to quickly compare the keys of 2 dictionaries, I'm creating sets of the keys using this method:
dict_1 = {"file_1":10, "file_2":20, "file_3":30, "file_4":40}
dict_2 = {"file_1":10, "file_2":20, "file_3":30}
set_1 = {file for file in dict_1}
set_2 = {file for file in dict_2}
Than I use diff_set = set_1 - set_2
to see which keys are missing from set_2.
Is there a faster way? I see that using set(dict.keys())
is less of a workarou, so I'll switch to it - but is it more efficient?
CodePudding user response:
Let's measure more properly (not just measuring a single execution and also not including the setup) and include faster solutions:
300 ns 300 ns 300 ns {*dict_1} - {*dict_2}
388 ns 389 ns 389 ns {file for file in dict_1 if file not in dict_2}
389 ns 390 ns 390 ns dict_1.keys() - dict_2
458 ns 458 ns 458 ns set(dict_1) - set(dict_2)
472 ns 472 ns 472 ns dict_1.keys() - dict_2.keys()
665 ns 665 ns 668 ns set(dict_1.keys()) - set(dict_2.keys())
716 ns 716 ns 716 ns {file for file in dict_1} - {file for file in dict_2}
Benchmark code (Try it online!):
import timeit
setup = '''
dict_1 = {"file_1":10, "file_2":20, "file_3":30, "file_4":40}
dict_2 = {"file_1":10, "file_2":20, "file_3":30}
'''
codes = [
'{file for file in dict_1} - {file for file in dict_2}',
'set(dict_1) - set(dict_2)',
'set(dict_1.keys()) - set(dict_2.keys())',
'dict_1.keys() - dict_2',
'dict_1.keys() - dict_2.keys()',
'{*dict_1} - {*dict_2}',
'{file for file in dict_1 if file not in dict_2}',
]
exec(setup)
for code in codes:
print(eval(code))
tss = [[] for _ in codes]
for _ in range(20):
print()
for code, ts in zip(codes, tss):
number = 10000
t = min(timeit.repeat(code, setup, number=number)) / number
ts.append(t)
for code, ts in sorted(zip(codes, tss), key=lambda cs: sorted(cs[1])):
print(*('= ns ' % (t * 1e9) for t in sorted(ts)[:3]), code)
CodePudding user response:
The fastest and most efficient way would be to convert your dictionaries to set()
diff_set = set(dict_1) - set(dict_2)
Output:
{'file_4'}
Proof(Execution time comparison):
Method 1(Method discussed in question)
import timeit
start1 = timeit.default_timer()
dict_1 = {"file_1":10, "file_2":20, "file_3":30, "file_4":40}
dict_2 = {"file_1":10, "file_2":20, "file_3":30}
set_1 = {file for file in dict_1}
set_2 = {file for file in dict_2}
diff_set = set_1 - set_2
stop1 = timeit.default_timer()
execution_time1 = stop1 - start1
print(f"It took {execution_time1} time for method 1")
Method 2(Suggested answer)
import timeit
start2 = timeit.default_timer()
dict_3 = {"file_1":10, "file_2":20, "file_3":30, "file_4":40}
dict_4 = {"file_1":10, "file_2":20, "file_3":30}
diff_set = set(dict_3) - set(dict_4)
stop2 = timeit.default_timer()
execution_time2 = stop2 - start2
print(f"It took {execution_time2} time for method 2")
Method 3(dict.keys()
)
import timeit
start3 = timeit.default_timer()
dict_5 = {"file_1":10, "file_2":20, "file_3":30, "file_4":40}
dict_6 = {"file_1":10, "file_2":20, "file_3":30}
diff_set = set(dict_5.keys()) - set(dict_6.keys())
stop3 = timeit.default_timer()
execution_time3 = stop3 - start3
print(f"It took {execution_time3} time for method 3")
Output:
It took 9.499999578110874e-06 time for method 1
It took 6.399990525096655e-06 time for method 2
It took 8.699993486516178e-06 time for method 3