Home > other >  Speed difference between custom function and dict.get(value) for python dictionary
Speed difference between custom function and dict.get(value) for python dictionary

Time:02-21

I want to improve the performance of a huge code that works on logging files. In this file, there is a custom function that looks in a dictionary to find any key of a list of keys. However, in most cases the list includes only one key. So, my question is: If i replace this function with the built-in python method dict.get(key) will I see significant performance increase? The custom function is the following one:

def find_value(some_dict, list_of_keys):
    for key in list_of_keys:
        if key in some_dict:
            field = some_dict[key]
        else:
            return None
    return field

If I replace the above function with the common: some_dict.get(keys[0]) in all cases that keys length is 1, will I see increase of performance? These cases are hundreds in the entire file and as they are found in loops, this function is executed many thousands even millions of times

CodePudding user response:

Just doing some very informal benchmarking with timeit, it looks like using just dict.get for cases where keys is a single key would be faster. It's tough to say how much faster without testing with dictionaries you would actual encounter in your application though so I would recommend doing further benchmarking.

Benchmark script

import timeit


def find_value(some_dict, keys):
    for key in keys:
        if key in some_dict:
            field = some_dict[key]
        else:
            return None
    return field


def main():
    d = {
        'a': 1,
        'b': 2,
        'c': 3
    }

    iterations = 100_000
    original_present = timeit.timeit(lambda: find_value(d, ['b']), number=iterations)
    get_present = timeit.timeit(lambda: d.get('b'), number=iterations)

    original_absent = timeit.timeit(lambda: find_value(d, ['d']), number=iterations)
    get_absent = timeit.timeit(lambda: d.get('d'), number=iterations)

    print('Original Present:', original_present)
    print('Original Absent:', original_absent)
    print('Just Get Present:', get_present)
    print('Just Get Absent:', get_absent)


if __name__ == '__main__':
    main()

Results

Original Present: 0.018157875
Original Absent: 0.012847083000000002
Just Get Present: 0.0078042089999999995
Just Get Absent: 0.0066016669999999986

To answer your question directly: it depends but the above benchmark seems to indicate better performance from just using dict.get. Whether that performance gain is meaningful is up to you.

CodePudding user response:

It may depend on scenario,but built-in dict.get is faster because dictionary uses hashing to store value. It apply hash function on key and store value in hash table based on the output of hash function. so, to access value ,again dictionary apply hash function on key and from the output of hash function it fetch the value(most prob.

for better understanding please refer this: How are Python's Built In Dictionaries Implemented?

  • Related