I'm trying to allow searching by using the edge_ngram
tokenizer. I followed the example in the tutorial and just added the custom_token_chars
configuration as follows:
PUT test-ngrams
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
],
"custom_token_chars" : [
"!"
]
}
}
}
}
}
I then tried to create a search with the char !
as follows:
POST test-ngrams/_analyze
{
"analyzer": "my_analyzer",
"text": "!Quick Foxes."
}
But the result I'm getting ignores the !
:
{
"tokens" : [
{
"token" : "Qu",
"start_offset" : 1,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "Qui",
"start_offset" : 1,
"end_offset" : 4,
"type" : "word",
"position" : 1
},
{
"token" : "Quic",
"start_offset" : 1,
"end_offset" : 5,
"type" : "word",
"position" : 2
},
{
"token" : "Quick",
"start_offset" : 1,
"end_offset" : 6,
"type" : "word",
"position" : 3
},
{
"token" : "Fo",
"start_offset" : 7,
"end_offset" : 9,
"type" : "word",
"position" : 4
},
{
"token" : "Fox",
"start_offset" : 7,
"end_offset" : 10,
"type" : "word",
"position" : 5
},
{
"token" : "Foxe",
"start_offset" : 7,
"end_offset" : 11,
"type" : "word",
"position" : 6
},
{
"token" : "Foxes",
"start_offset" : 7,
"end_offset" : 12,
"type" : "word",
"position" : 7
}
]
}
CodePudding user response:
The configuration for your tokenizer is incomplete.
It should include custom
in the list of token_chars
to enable the custom_token_chars
that you've set.
Character classes may be any of the following:
- letter — for example a, b, ï or 京
- digit — for example 3 or 7
- whitespace — for example " " or "\n"
- punctuation — for example ! or "
- symbol — for example $ or √
- custom — custom characters which need to be set using the custom_token_chars setting.
Source: official documentation
PUT test-ngrams
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit",
"custom"
],
"custom_token_chars": [
"!"
]
}
}
}
}
}