Synonyms relevance issue in Elasticsearch-CodePudding

I am trying to configured synonyms in elasticsearch and done the sample configuration as well. But not getting expected relevancy when i am searching data. Below is index Mapping configuration:

PUT /test_index
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_synonyms": {
            "type": "synonym",
            "synonyms": [
              "mind, brain",
              "brainstorm,brain storm"
            ]
          }
        },
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase"
            ]
          },
          "my_search_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "my_synonyms"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "my_field": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Below is sample data which i have indexed:

POST test_index/_bulk
{ "index" : { "_id" : "1" } }
{"my_field": "This is a brainstorm" }
{ "index" : { "_id" : "2" } }
{"my_field": "A different brain storm" }
{ "index" : { "_id" : "3" } }
{"my_field": "About brainstorming" }
{ "index" : { "_id" : "4" } }
{"my_field": "I had a storm in my brain" }
{ "index" : { "_id" : "5" } }
{"my_field": "I envisaged something like that" }

Below is query which i am trying:

GET test_index/_search
{
  "query": {
    "match": {
      "my_field": {
        "query": "brainstorm",
         "analyzer": "my_search_analyzer"
      }
    }
  }
}

Current Result:

 "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.8185701,
        "_source" : {
          "my_field" : "A different brain storm"
        }
      },
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.4100728,
        "_source" : {
          "my_field" : "I had a storm in my brain"
        }
      },
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.90928507,
        "_source" : {
          "my_field" : "This is a brainstorm"
        }
      }
    ]

I am expecting document which is matching exect with query on top and document which is matching with synonyms should come with low score. so here my expectation is document with value "This is a brainstorm" should come at position one.

Could you please suggest me how i can achive.

I have tried to applied boosting and weightage as well but no luck.

Thanks in advance !!!

CodePudding user response：

Elasticsearch "replaces" every instance of a synonym all other synonyms, and does so on both indexing and searching (unless you provide a separate search_analyzer) so you're losing the exact token. To keep this information, use a subfield with standard analyzer and then use multi_match query to match either synonyms or exact value boost the exact field.

CodePudding user response：

I have got answer from Elastic Forum here. I have copied below for quick referance.

Hello there,

Since you are indexing synonyms into your inverted index, brain storm and brainstorm are all different tokens after analyzer does its thing. So Elasticsearch on query time uses your analyzer to create tokens for brain, storm and brainstorm from your query and match multiple tokens with indexes 2 and 4, your index 2 has lesser words so tf/idf scores it higher between the two and index number 1 only matches brainstorm.

You can also see what your analyzer does to your input with this;

POST test_index/_analyze
{
  "analyzer": "my_search_analyzer",
  "text": "I had a storm in my brain"
}

I did some trying out so, you should change your index analyzer to my_analyzer;

PUT /test_index
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_synonyms": {
            "type": "synonym",
            "synonyms": [
              "mind, brain",
              "brainstorm,brain storm"
            ]
          }
        },
        "analyzer": {
          "my_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase"
            ]
          },
          "my_search_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "my_synonyms"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "my_field": {
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Then you want to boost your exact matches, but you also want to get hits from my_search_analyzer tokens as well so i have changed your query a bit;

GET test_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "my_field": {
              "query": "brainstorm",
              "analyzer": "my_search_analyzer"
            }
          }
        },
        {
          "match_phrase": {
            "my_field": {
              "query": "brainstorm"
            }
          }
        }
      ]
    }
  }
}

result:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 2.3491273,
    "hits" : [
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 2.3491273,
        "_source" : {
          "my_field" : "This is a brainstorm"
        }
      },
      {
        "_index" : "test_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.8185701,
        "_source" : {
          "my_field" : "A different brain storm"
        }
      }
    ]
  }
 }