Home > Net >  elasticsearch tokenizer to generate tokens for special characters
elasticsearch tokenizer to generate tokens for special characters

Time:10-14

I have a use case where special characters also should be searchable. I have tried some tokenizers like char_group, standard, n-gram. If I use an n-gram tokenizer I am able to make special characters searchable(since it generates a token for each character). But n-gram generates too many tokens, so I am not interested in using an n-gram tokenizer. For example, if the text is hey john.s #100 is a test name, then the tokenizer should create tokens for [hey, john, s, #, 100, is, a, test, name]

Please refer to this question for a detailed explanation.

Thank you.....

CodePudding user response:

Based on your use-case the best option would be to use a Whitespace tokenizer with a combination of Word Delimiter Graph filter.

For more information check the official documentation of Elasticsearch about whitespace tokenizer and word delimiter graph filter here:

https://www.elastic.co/guide/en/elasticsearch/reference/8.4/analysis-whitespace-tokenizer.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-graph-tokenfilter.html

  • Related