I want to make a container with few elements and I will be just checking whether an element is part of that set or not. I know a vector would not be the appropriate container if the set is big enough since each research would be worst-case O(n) and there are better options that use a hash function or binary trees. However I was wondering what happens if my set has few elements (eg just 5) and I know it in advance, is it worth it to implement the container as a structure having a hash function? Maybe if the set is not big enough the overhead introduced by having to apply the hash function is greater than having to iterate through 5 elements.
For example in C using std::unordered_set instead of std::vector. As always, thank you
CodePudding user response:
There are many factors affecting the point where the std::vector
falls behind other approaches. See std::vector faster than std::unordered_set? and Performance of vector sort/unique/erase vs. copy to unordered_set for some of the reasons why this is the case. As a consequence, any calculation of this point would have to be rather involved and complicated. The most expedient way to find this point is performance testing.
Keep in mind that some of the factors are based on the hardware being used. So not only do you need performance testing on your development machine, but also performance testing on "typical" end-user machines. Your development machine can give you a gut check, though, (as can an online tool like Quick Bench) letting you know that you are not even in the ballpark yet. (The intuition of individual programmers is notoriously unreliable. Some people see 100 as a big number and worry about performance; others don't worry until numbers hit the billions. Either of these people would be blown away by the others' view.)
Given the difficulty in determining the point where std::vector
falls behind, I think this is a good place to give a reminder that premature optimization is usually a waste of time. Investigating this performance out of curiosity is fine, but until this is identified as a performance bottleneck, don't hold up work on more important aspects of your project for this. Choose the approach that fits your code best.
That being said, I personally would assume the break point is well, well over 10 items. So for the 5-element collection of the question, I would be inclined to use a vector and not look back.