How to implement an efficient WordFinder?-CodePudding

I was asked the following question in an interview.

class WordFinder:
    def init(words):
    def find(characters):

Given the class above (pseudocode, fit it to language of your choice), where words is a set of words use to initialize the class, and characters is a list of characters, return all the words from words that can be formed by any subset of characters. You may do some preprocessing if needed.

Example:

words = { "wood", "word", "words" }

characters = [ 'o', 'w', 'o', 's', 'd', 'a', 'r' ]

Output: { "wood", "word", "words" }

Which I implemented in Python as follows:

from typing import Set, Sequence
import collections

class WordFinder:
    def __init__(self, words: Set[str]):
        self._words = { word: collections.Counter(word) for word in words }

    def find(self, characters: Sequence[str]) -> Sequence[str]:
        x = collections.Counter(characters)
        return [word for word, c in self._words.items() if not (c - x)]

words = {"wood", "word", "words"}
wf = WordFinder(words)
actual = wf.find(['o', 'w', 'o', 's', 'd', 'a', 'r'])
assert set(actual) == words

This works, but the find method loops over all the words each time. Is there a better way? The statement that some preprocessing can be done seems to be a hint that I didn't take.

Python or Java implementation are acceptable.

Disclaimer: I found that a similar question had been asked before, but no answer was accepted, and none of the answers are better than what I did.

CodePudding user response：

"word" is a substring/subset of "words", so you could do some preprocessing to detect those, and build a tree from that, i.e. attach "words" as a child of "word" (because it needs one additional letter 's'). When checking for a match, if "word" already failed, you do not need to traverse the rest of the tree of supersets downwards anymore. The initial initialization will be slower due to preprocessing, but the find should be faster. I assume that find is ought be be called a lot of times, but initialization only once.

I cannot produce the pseudocode for that right now, but I guess the idea might be sufficient for you to do it.

CodePudding user response：

Any optimization has prerequisites. For the test cases you give now, you have done well enough. But you want to reduce the number of loops, then I assume that each of your words has millions of characters, then you can use buckets to optimize

public class Test21 {

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        Test21 ts = new Test21();
        int[] src = ts.find(new char[]{'o', 'w', 'o', 's', 'd', 'a', 'r'});
        ts.init(new String[]{"wood", "word", "words", "wooood", "w"}, src);
    }
    
    public int[] find(char[] by){
        int[] src = new int[26];
        for(int i: by){
            src[i - 97] = src[i - 97]   1; 
        }
        return src;
    }
    
    public void init(String[] tar, int[] src){
        for(String str: tar){
            if(this.compare(this.find(str.toCharArray()), src))
                System.out.println(str);
        }
    }
    
    private boolean compare(int[] tar, int[] src){
        for(int i = 0; i < src.length; i  ){
            if(tar[i] > src[i])
                return false;
        }
        return true;
    }

}