I have a std::vector<std::string>
that has 43,000 dictionary words. I have about 315,000 maybe-words and for each one I need to determine if it's a valid word or not. This takes a few seconds and I need to complete the task as fast as possible.
Any ideas on the best way to complete this? Currently I iterate through on each attempt:
for (std::string word : words) {
if (!(std::find(dictionary.begin(), dictionary.end(), word) != dictionary.end())) {
// The word is not the dictionary
return false;
}
}
return true;
Is there a better way to iterate multiple times? I have a few hypothesis, such as
- Create a cache of invalid words, since the 315,000 list probably has 25% duplicates
- Only compare with words of the same length
Is there a better way to do this? I'm interested in an algorithm or idea.
CodePudding user response:
Is there a better way to iterate multiple times?
Yes. Convert the vector to another data structure that supports faster lookups. The standard library comes with std::set
and std::unordered_set
which are both likely to be faster than repeated linear search. Other data structures may be even more efficient.
If your goal is to create a range of words or non-words in the maybe set, then another efficient approach would be to sort both vectors, and use std::(ranges::)set_intersection
or std::(ranges::)set_difference
.