Home > database >  Speed up typical "find" of linked list by pairing with bubble sort?
Speed up typical "find" of linked list by pairing with bubble sort?

Time:02-27

I'm making a forth-like language that is supposed to run in places memory is a premium.

The language uses a linked list for it's dictionary of language words due to space constraints and lack of resizeable memory.

However, it hurts me to think of the performance costs of the linked list lookup. I understand that the worst case will always be O(n), but I was trying to think of ways to at least improve the typical case when I realized something: what if the "find" method, in addition to finding a key, also performs a single bubble-sort-like operation each time a key is hit. This way the most common keys will "bubble" to the top. Even better, the entire list will be re-weighted as compilation continues and should roughly correlate to a key's continuous statistical likelihood.

Has this technique been used in other places? I'm curious if there is a decent mathematical demonstration of it's runtime complexity (assuming a statistical curve of some names vs others). A single bubble sort operation is clearly O(1) so at least it can't hurt the theoretical runtime complexity.

CodePudding user response:

While this strategy should improve the average run-time in common cases, this does not change the worst-case complexity which is O(n).

Indeed, if the searched key is in the end of the list of size n, a find will run in O(n) time. The bubble-swap operation runs in O(1) time (assuming the key can actually be compared in constant time). The next find operation is a bit faster if the same key is fetched but still O(n). After n fetches of the same key, fetching this key can be done in O(1) time. However, fetching n other key in a specific order can reorder the list so that the initial key is put at the end. More specifically, fetching the item next to the initial key n times does that. In the end, fetching n time the initial key and n other keys in a specific order results in (n n-1 n-2 ... 2 1) (1 2 ... n-1 n) = n*(n-1) = n²-n = O(n²) operations. After these operations, the list should be in the same state then the initial one. Thus, the complexity is clearly O(n).

Note that you can implement a cache to speed up fetches. Many policies exists. You can find some of them described here. Note that this should not impact the complexity, but will certainly greatly improve the execution time. The cache can store an iterator to a node of the linked list. Iterators are not invalidated when items are inserted/deleted (unless the target item is actually deleted).

Note also that linked lists are generally very slow. They are not very efficient in term of memory usage too because the pointer to the next item take some space (8 bytes on a 64-bit architecture). Allocated nodes can also require some hidden space regarding the standard library allocator used (some store pointer metadata like the allocated space). A solution to use less memory is to use linked list containing buckets of key-value pairs.

Note that while balanced binary search trees require a bit more memory, they can be much more efficient to solve this problem. The complexity of finding a key in such data structure is O(log n). A good hash-map implementation can be quite compact in memory (see hopscotch hashing) too although the memory consumption of the hash-map can be quite big when it is resized.

  • Related