I'm quite new to ElasticSearch/OpenSearch, so if I overlook something basic, please forgive me.
I'm making a service which performs full-text search on novels.
novels
index has the following fields:
title
keyword. 10-40 characters long in average.conetnt
text type. in most documents, far longer than 100,000.
As an experiment, I change the type of content
type into keyword, bacause I wanted to know the difference between text type and keyword type. I was not sure if documents which contains super long keyword field would be indexed, but no problem occurred.
And tried the following wildcard query on it.
{
"query": {
"wildcard": {
"content": "*あらゆる透明な幽霊の複合体*"
}
}
}
However, it didn't hit the document which contains the above phrase.
After that, I tried wildcard query on title
field, and it worked nicely, and then I tried match_phrase query, like the following, and it hit the proper document as well:
{
"query": {
"match_phrase": {
"content": "あらゆる透明な幽霊の複合体"
}
}
}
So, the only reason I could come up with why wildcard query didn't work on content
field is, simply, it's super long.
Now, I have two questions.
- Is my hypothesis--wildcard query doesn't work on super long texts--correct?
- If my hypothesis is true, why wildcard query doesn't work on super long texts? What's the mechanism under the hood?
I would deeply appreciate it if you would shed some lights on it.
Incidentally, I'm using opensearch1.2.4 on docker.
CodePudding user response:
The reason might be that your keyword
field has an ignore_above
parameter set to some value smaller than your content length, so any string of text longer than that will not be indexed.
It's usually not a good idea to store long strings of text as keyword. You should prefer either match_only_text
or the wildcard
field type depending on your exact use case.
I think, however, that those two field types are not available on Opensearch, so your only option is to play with the ignore_above
parameter or simply use the text
field type and use a match_phrase
query.