After I added synonym analyzer to my_index, the index became case-sensitive
I have one property called nationality
that has synonym analyzer
. But it seems that this property become case sensitive
because of the synonym analyzer.
Here is my /my_index/_mappings
{
"my_index": {
"mappings": {
"items": {
"properties": {
.
.
.
"nationality": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "synonym"
},
.
.
.
}
}
}
}
}
Inside the index, i have word India COUNTRY
. When I try to search India nation
using the command below, I will get the result.
POST /my_index/_search
{
"query": {
"match": {
"nationality": "India nation"
}
}
}
But, when I search for india
(notice the letter i
is lowercase), I will get nothing.
My assumption is, this happend because i put uppercase
filter before the synonym
. I did this because the synonyms are uppercased. So the query India
will be INDIA
after pass through this filter.
Here is my /my_index/_settings
{
"my_index": {
"settings": {
"index": {
"number_of_shards": "1",
"provided_name": "my_index",
"similarity": {
"default": {
"type": "BM25",
"b": "0.9",
"k1": "1.8"
}
},
"creation_date": "1647924292297",
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"lenient": "true",
"synonyms": [
"NATION, COUNTRY, FLAG"
]
}
},
"analyzer": {
"synonym": {
"filter": [
"uppercase",
"synonym"
],
"tokenizer": "whitespace"
}
}
},
"number_of_replicas": "1",
"version": {
"created": "6080099"
}
}
}
}
}
Is there a way so I can make this property still case-insensitive. All the solution i've found only shows that I should only either set all the text inside nationality
to be lowercase or uppercase. But how if I have uppercase & lowercase letters inside the index?
CodePudding user response:
Did you apply synonym filter after adding your data into index?
If so, probably "India COUNTRY" phrase was indexed exactly as "India COUNTRY". When you sent a match query to index, your query was analyzed and sent as "INDIA COUNTRY" because you have uppercase filter anymore, it is matched because you are using match query, it is enough to match one of the words. "COUNTRY" word provide this.
But, when you sent one word query "india" then it is analyzed and converted to "INDIA" because of your uppercase filter but you do not have any matching word on your index. You just have a document contains "India COUNTRY".
My answer has a little bit assumption. I hope that it will be useful to understand your problem.
CodePudding user response:
I have found the solution!
I didn't realize that the filter that I applied in the settings
is applicable while updating and searching the data. At first, I did this step:
- Create index with synonym filter
- Insert data
- Add
uppercase
before synonym filter
By doing that, the uppercase
filter is not applied to my data. What I should've done are:
- Create index with
uppercase
&synonym
filter (pay attention to the order) - Insert data Then the filter will be applied to my data.