I was asked the following question in an interview.
class WordFinder:
def init(words):
def find(characters):
Given the class above (pseudocode, fit it to language of your choice), where
words
is a set of words use to initialize the class, andcharacters
is a list of characters, return all the words fromwords
that can be formed by any subset ofcharacters
. You may do some preprocessing if needed.Example:
words = { "wood", "word", "words" }
characters = [ 'o', 'w', 'o', 's', 'd', 'a', 'r' ]
Output: { "wood", "word", "words" }
Which I implemented in Python as follows:
from typing import Set, Sequence
import collections
class WordFinder:
def __init__(self, words: Set[str]):
self._words = { word: collections.Counter(word) for word in words }
def find(self, characters: Sequence[str]) -> Sequence[str]:
x = collections.Counter(characters)
return [word for word, c in self._words.items() if not (c - x)]
words = {"wood", "word", "words"}
wf = WordFinder(words)
actual = wf.find(['o', 'w', 'o', 's', 'd', 'a', 'r'])
assert set(actual) == words
This works, but the find
method loops over all the words each time. Is there a better way? The statement that some preprocessing can be done seems to be a hint that I didn't take.
Python or Java implementation are acceptable.
Disclaimer: I found that a similar question had been asked before, but no answer was accepted, and none of the answers are better than what I did.
CodePudding user response:
"word"
is a substring/subset of "words"
, so you could do some preprocessing to detect those, and build a tree from that, i.e. attach "words"
as a child of "word"
(because it needs one additional letter 's'
). When checking for a match, if "word"
already failed, you do not need to traverse the rest of the tree of supersets downwards anymore. The initial initialization will be slower due to preprocessing, but the find
should be faster. I assume that find
is ought be be called a lot of times, but initialization only once.
I cannot produce the pseudocode for that right now, but I guess the idea might be sufficient for you to do it.
CodePudding user response:
Any optimization has prerequisites. For the test cases you give now, you have done well enough. But you want to reduce the number of loops, then I assume that each of your words has millions of characters, then you can use buckets
to optimize
public class Test21 {
public static void main(String[] args) {
// TODO Auto-generated method stub
Test21 ts = new Test21();
int[] src = ts.find(new char[]{'o', 'w', 'o', 's', 'd', 'a', 'r'});
ts.init(new String[]{"wood", "word", "words", "wooood", "w"}, src);
}
public int[] find(char[] by){
int[] src = new int[26];
for(int i: by){
src[i - 97] = src[i - 97] 1;
}
return src;
}
public void init(String[] tar, int[] src){
for(String str: tar){
if(this.compare(this.find(str.toCharArray()), src))
System.out.println(str);
}
}
private boolean compare(int[] tar, int[] src){
for(int i = 0; i < src.length; i ){
if(tar[i] > src[i])
return false;
}
return true;
}
}