Home > Software engineering >  Elasticsearch Became case sensitive after add synonym analyzer
Elasticsearch Became case sensitive after add synonym analyzer

Time:03-23

After I added synonym analyzer to my_index, the index became case-sensitive

I have one property called nationality that has synonym analyzer. But it seems that this property become case sensitive because of the synonym analyzer.

Here is my /my_index/_mappings

{
  "my_index": {
    "mappings": {
      "items": {
        "properties": {
          .
          .
          .
          "nationality": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            },
            "analyzer": "synonym"
          },
          .
          .
          .
        }
      }
    }
  }
}

Inside the index, i have word India COUNTRY. When I try to search India nation using the command below, I will get the result.

POST /my_index/_search
{
  "query": {
    "match": {
      "nationality": "India nation"
    }
  }
}

But, when I search for india (notice the letter i is lowercase), I will get nothing. My assumption is, this happend because i put uppercase filter before the synonym. I did this because the synonyms are uppercased. So the query India will be INDIA after pass through this filter.

Here is my /my_index/_settings

{
  "my_index": {
    "settings": {
      "index": {
        "number_of_shards": "1",
        "provided_name": "my_index",
        "similarity": {
          "default": {
            "type": "BM25",
            "b": "0.9",
            "k1": "1.8"
          }
        },
        "creation_date": "1647924292297",
        "analysis": {
          "filter": {
            "synonym": {
              "type": "synonym",
              "lenient": "true",
              "synonyms": [
                "NATION, COUNTRY, FLAG"
              ]
            }
          },
          "analyzer": {
            "synonym": {
              "filter": [
                "uppercase",
                "synonym"
              ],
              "tokenizer": "whitespace"
            }
          }
        },
        "number_of_replicas": "1",
        "version": {
          "created": "6080099"
        }
      }
    }
  }
}

Is there a way so I can make this property still case-insensitive. All the solution i've found only shows that I should only either set all the text inside nationality to be lowercase or uppercase. But how if I have uppercase & lowercase letters inside the index?

CodePudding user response:

Did you apply synonym filter after adding your data into index?

If so, probably "India COUNTRY" phrase was indexed exactly as "India COUNTRY". When you sent a match query to index, your query was analyzed and sent as "INDIA COUNTRY" because you have uppercase filter anymore, it is matched because you are using match query, it is enough to match one of the words. "COUNTRY" word provide this.

But, when you sent one word query "india" then it is analyzed and converted to "INDIA" because of your uppercase filter but you do not have any matching word on your index. You just have a document contains "India COUNTRY".

My answer has a little bit assumption. I hope that it will be useful to understand your problem.

CodePudding user response:

I have found the solution!

I didn't realize that the filter that I applied in the settings is applicable while updating and searching the data. At first, I did this step:

  1. Create index with synonym filter
  2. Insert data
  3. Add uppercase before synonym filter

By doing that, the uppercase filter is not applied to my data. What I should've done are:

  1. Create index with uppercase & synonym filter (pay attention to the order)
  2. Insert data Then the filter will be applied to my data.
  • Related