Home > Software design >  How to quickly search a large vector many times?
How to quickly search a large vector many times?

Time:11-23

I have a std::vector<std::string> that has 43,000 dictionary words. I have about 315,000 maybe-words and for each one I need to determine if it's a valid word or not. This takes a few seconds and I need to complete the task as fast as possible.

Any ideas on the best way to complete this? Currently I iterate through on each attempt:

for (std::string word : words) {
    if (!(std::find(dictionary.begin(), dictionary.end(), word) != dictionary.end())) {
        // The word is not the dictionary
        return false;
    }
}
return true;

Is there a better way to iterate multiple times? I have a few hypothesis, such as

  • Create a cache of invalid words, since the 315,000 list probably has 25% duplicates
  • Only compare with words of the same length

Is there a better way to do this? I'm interested in an algorithm or idea.

CodePudding user response:

Is there a better way to iterate multiple times?

Yes. Convert the vector to another data structure that supports faster lookups. The standard library comes with std::set and std::unordered_set which are both likely to be faster than repeated linear search. Other data structures may be even more efficient.

If your goal is to create a range of words or non-words in the maybe set, then another efficient approach would be to sort both vectors, and use std::(ranges::)set_intersection or std::(ranges::)set_difference.

  • Related