I have a use case where special characters also should be searchable. I have tried some tokenizers like char_group, standard, n-gram. If I use an n-gram tokenizer I am able to make special characters searchable(since it generates a token for each character). But n-gram generates too many tokens, so I am not interested in using an n-gram tokenizer. For example, if the text is hey john.s #100 is a test name, then the tokenizer should create tokens for [hey, john, s, #, 100, is, a, test, name]
Please refer to this question for a detailed explanation.
Thank you.....
CodePudding user response:
Based on your use-case the best option would be to use a Whitespace
tokenizer with a combination of Word Delimiter Graph filter
.
For more information check the official documentation of Elasticsearch about whitespace tokenizer and word delimiter graph filter here:
https://www.elastic.co/guide/en/elasticsearch/reference/8.4/analysis-whitespace-tokenizer.html