Home > Software engineering >  ElasticSearch: check how analyzers/tokenizers/filters applied to an index split text into tokens?
ElasticSearch: check how analyzers/tokenizers/filters applied to an index split text into tokens?

Time:01-09

I'm quite new to ElasticSearch, so if I overlook something obvious/basic, please forgive me.

Now I'm using ElasticSearch at work, and want to see how the complex settings of analyzers/tokenizers/filters--which are set by my predecessors--split texts into tokens.

I did some research and found the way to do it:

GET /_analyze
{
  "tokenizer" : "whitespace",
  "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
  "text" : "this is a test"
}

However, as I said, the settings of analyzers/tokenizers/filters is so complicated that writing the details every time I test the settings would horribly slow me down.

So I want to analyze a text with analyzers/tokenizers/filters settings already applied to an index. Is there way to do that?

I would appreciate it if anyone would shed some lights on it.

CodePudding user response:

You don't have to supply the complete analyzer definition every time to analyze API, you can simply use the _analyze API on index and use it like following

GET <your-index-name>/_analyze
{
  "analyzer" : "standard",
  "text" : "Quick Brown Foxes!"
}

So instead of using the analyze API at a cluster level, you will be using it on index level, where analyzer definition is already present, so you just need to provide the analyzer name not its definition like filter etc to get the tokens based on the analyzer.

Refer Elasticsearch official documentation on using it on specific index or on a specific field with examples.

Hope this helps.

  • Related