Home > Back-end >  ElasticSearch/OpenSearch: wildcard query doesn't work on a super long keyword field
ElasticSearch/OpenSearch: wildcard query doesn't work on a super long keyword field

Time:01-14

I'm quite new to ElasticSearch/OpenSearch, so if I overlook something basic, please forgive me.

I'm making a service which performs full-text search on novels. novels index has the following fields:

  • title keyword. 10-40 characters long in average.
  • conetnt text type. in most documents, far longer than 100,000.

As an experiment, I change the type of content type into keyword, bacause I wanted to know the difference between text type and keyword type. I was not sure if documents which contains super long keyword field would be indexed, but no problem occurred.

And tried the following wildcard query on it.

{
  "query": {
    "wildcard": {
      "content": "*あらゆる透明な幽霊の複合体*"
    }
  }
}

However, it didn't hit the document which contains the above phrase. After that, I tried wildcard query on title field, and it worked nicely, and then I tried match_phrase query, like the following, and it hit the proper document as well:

{
  "query": {
    "match_phrase": {
      "content": "あらゆる透明な幽霊の複合体"
    }
  }
}

So, the only reason I could come up with why wildcard query didn't work on content field is, simply, it's super long.

Now, I have two questions.

  1. Is my hypothesis--wildcard query doesn't work on super long texts--correct?
  2. If my hypothesis is true, why wildcard query doesn't work on super long texts? What's the mechanism under the hood?

I would deeply appreciate it if you would shed some lights on it.

Incidentally, I'm using opensearch1.2.4 on docker.

CodePudding user response:

The reason might be that your keyword field has an ignore_above parameter set to some value smaller than your content length, so any string of text longer than that will not be indexed.

It's usually not a good idea to store long strings of text as keyword. You should prefer either match_only_text or the wildcard field type depending on your exact use case.

I think, however, that those two field types are not available on Opensearch, so your only option is to play with the ignore_above parameter or simply use the text field type and use a match_phrase query.

  • Related