I am trying to create an analyzer which can return all possible tokens, for example for this word AB-12-1993 xyz.pdf
the tokens generated would be AB, AB-12, -12-1993, 12-1993, -1993, 1993, AB-12-1993 xyz, xyz, xyz.pdf, AB-12-1993 xyz.pdf
, if any other extra token is generated that is not an issue. But these should be generated.
I have tried with whitespace analyzer with ngram but these -12-1993, 12-1993, -1993, 1993
are not getting generated.
I have also tried this, with different analyzers but of no help
I am using elasticsearch 8.3.3. Can somebody please help me out here please?
CodePudding user response:
You can use below definition for your analyzer which produces your required tokens
PUT ngram_custom_example
{
"settings": {
"index": {
"max_ngram_diff": 10
},
"analysis": {
"analyzer": {
"default": {
"tokenizer": "keyword",
"filter": [ "2_10_grams" ]
}
},
"filter": {
"2_10_grams": {
"type": "ngram",
"min_gram": 2,
"max_gram": 10
}
}
}
}
}