Home > OS >  Python efficient implementation of a fix-sized ordered by value hash map
Python efficient implementation of a fix-sized ordered by value hash map

Time:01-15

I want to keep a list of the top N entries based on their value, and update it every time a larger value is received. For example, if N=3

LIST
id: 'abc' --> 323
id: 'cbs' --> 321
id: 'aac' --> 123

New entry: id: 'aaa' --> 101. Ignore
New entry: id: 'zzz' --> 111. Ignore
New entry: id: 'cwl' --> 322. Update list

LIST
id: 'abc' --> 323
id: 'cwl' --> 322
id: 'cbs' --> 321

In C this could be implemented with std::map of size N, adding to it and deleting its last entry. What's the best way in Python? Assume version > 3.6

CodePudding user response:

First of all, your comparison with C wouldn't work so easily, since std::map is sorted by key and not by value (though you could work around it by keeping also the inverse map in parallel).

For implementing something similar in python, first of all you can realize that for your specific case you don't need the elements to be sorted the whole time, the important thing you care about is to know what your smallest element is. You can achieve this easily by keeping a heap (minheap) of max size N with all values you have so far. When you get a new element, you just need to compare the value with the head (smallest element) of your heap, and if it is bigger, remove the head of the heap (and the corresponding entry in your map), and add the new element to the map and the value to the heap. Again, you would need to keep also the inverse map in order to know which element to delete from the map.

Python has a heap implementation in the module heapq, see reference here.

EDIT

For completeness, I will add here that although there is no native implementation of sorted maps in python, there are external packages you can use for this, e.g. sortedmap, but since for your specific case this is not necessary, I would still propose to go with a heap that does not require external packages and not add unnecessary dependencies.

  • Related