I have to write a program to store a huge data of strings and I have to print the list sorted after a user input. As far as I know hashtables can't be sorted, but I was thinking of copying the elements to a temporary list to sort and print, when prompted by user.
The reason why I chose hashtable to store the data is for the 0(1) average time complexity, and search algorithm in the program is way more used than sorting.
Is it more efficient to use a hashtable to store the strings and then, only if prompted by user, to sort it in an list, or is it better to just directly store the strings in a list?
CodePudding user response:
Assuming you need to manage this in-memory (meaning you don't care about persistence) a very efficient way of managing sorted strings, both in terms of memory usage and lookup speed, is the Radix tree.
Their implementation is not trivial, but you may find libraries that already implement them for you. For instance this one
It basically splits several strings into shared prefixes and builds a tree.
So if you have the following list of strings:
* Car
* History
* House
* Horse
The tree looks like this:
|-- Car
|-- H
|-- istory
|-- o
|-- rse
|-- use
This is efficient because the more strings you have the more likely you are to find shared prefixes. Thus saving memory, as each shared prefix is only stored once for all the strings using it.
For instance if you want to insert another string into the tree, like "Cloud", the algorithm has to look up the shared prefix in the tree and the split the node and add the rest of the string. After that the tree is going to look like this:
|-- C
|-- ar
|-- loud
|-- H
|-- istory
|-- o
|-- rse
|-- use
Lookup takes O(k) time where k is the max length for a string. And the whole tree is sorted, so printing all the strings sorted means you just have to inspect the tree depth first
There are other algorithms you may consider if you don't find a good implementation for radix trees like Tries in general.
CodePudding user response:
Tries and radix trees are both reasonable choices here. A slightly less efficient but nonetheless solid choice here would be to use some sort of set based on balanced binary search trees (say, a red/black tree). This would naturally store the elements in sorted order and supports fairly quick insertions and deletions. They’re a bit slower than hash tables, but are a great choice if you need to access or manipulate the data in sorted order.