1, artificial maintenance a keyword library (less than 10 w), a key feature: support for multiple languages (English), common length is longer than the other (the average length of 42)
2, given a piece of text, text is generally between 500-1000, may also be multilingual text can not be sure languages mixed, so let's all keyword match again
Requirements:
1, from the text to find all the occurrences of the keywords, location and keywords (for highlighting)
2, the keyword match is case-insensitive
3, when the keyword matching allows the wrong a few characters (such as keyword length between 30 to 50, allowing two characters matching error)
For example:
Keywords: Caffeine, allowing the wrong one character
Can match to the text in the following content: caffein, caffeina, caffeine, etc
CodePudding user response:
Himself wrote a simple violent matching method, a character in a character through text, and then the first letter of certain words matched filter, it says the amount of data, complete matching speed can also accept, increases the allowed wrong after a few characters of processing speed to 2-3 minutes, totally unacceptable, strives for the great godCodePudding user response:
It is good to get a embedded databaseCodePudding user response:
Don't know what is the relationship between embedded database with this demand, please say in detailCodePudding user response:
Suggest the building research under state machine automatically, because you have allowed error requirement, need to undertake a certain deformation,CodePudding user response:
Split the word do keyword (index, key), such as caffein can be split into caf, fei, ein, and then do the indexMatch the same Caffeine search respectively, caf, fei, fine