Home > Software design >  Create all possible tokens in order in elasticsearch
Create all possible tokens in order in elasticsearch

Time:10-07

I am trying to create an analyzer which can return all possible tokens, for example for this word AB-12-1993 xyz.pdf the tokens generated would be AB, AB-12, -12-1993, 12-1993, -1993, 1993, AB-12-1993 xyz, xyz, xyz.pdf, AB-12-1993 xyz.pdf, if any other extra token is generated that is not an issue. But these should be generated.

I have tried with whitespace analyzer with ngram but these -12-1993, 12-1993, -1993, 1993 are not getting generated.

I have also tried this, with different analyzers but of no help

I am using elasticsearch 8.3.3. Can somebody please help me out here please?

CodePudding user response:

You can use below definition for your analyzer which produces your required tokens

PUT ngram_custom_example
{
  "settings": {
    "index": {
      "max_ngram_diff": 10
    },
    "analysis": {
      "analyzer": {
        "default": {
          "tokenizer": "keyword",
          "filter": [ "2_10_grams" ]
        }
      },
      "filter": {
        "2_10_grams": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 10
        }
      }
    }
  }
}
  • Related