Home > Blockchain >  Elasticsearch stems words end with symbols/special character
Elasticsearch stems words end with symbols/special character

Time:12-20

I use a whitespace tokenizer, with searchkick_stemmer

"company" -> "compani"

"company " -> "company "

how can I make "company " to be "compani " or ["compani"," "]

I've tried with edge-gram, works fine, but it generated too many tokens. I'm considering if there is another approach, like conditional scripting or else.

CodePudding user response:

I did this example but recommend read pattern token filter

POST _analyze
{
  "tokenizer": "whitespace",
  "filter": [
    "stemmer"
  ],
  "char_filter": {
    "type": "pattern_replace",
    "pattern": "[ ]",
    "replacement": " $0"
  },
  "text": [
    "company "
  ]
}

Tokens:

{
  "tokens": [
    {
      "token": "compani",
      "start_offset": 0,
      "end_offset": 7,
      "type": "word",
      "position": 0
    },
    {
      "token": " ",
      "start_offset": 7,
      "end_offset": 8,
      "type": "word",
      "position": 1
    }
  ]
}
  • Related