I use a whitespace
tokenizer, with searchkick_stemmer
"company" -> "compani"
"company " -> "company "
how can I make "company " to be "compani " or ["compani"," "]
I've tried with edge-gram, works fine, but it generated too many tokens. I'm considering if there is another approach, like conditional scripting or else.
CodePudding user response:
I did this example but recommend read pattern token filter
POST _analyze
{
"tokenizer": "whitespace",
"filter": [
"stemmer"
],
"char_filter": {
"type": "pattern_replace",
"pattern": "[ ]",
"replacement": " $0"
},
"text": [
"company "
]
}
Tokens:
{
"tokens": [
{
"token": "compani",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": " ",
"start_offset": 7,
"end_offset": 8,
"type": "word",
"position": 1
}
]
}