Home > Enterprise >  Searching for exact phrase with synonyms
Searching for exact phrase with synonyms

Time:11-25

I am trying to build a query, where I am using exact phrase match and synonyms and I can't figure it out. Also, when using wildcard approach I don't know how to use fuzziness. Is it even possible with wildcards? It would be great to get same results for terms "call of duty", "cod" or "call of dutz".

I have created this index:

PUT exact_search
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "0",
      "analysis": {
        "analyzer": {
          "analyzer_exact": {
            "type": "custom",
            "tokenizer": "keyword",
            "filter": [
              "lowercase",
              "icu_folding",
              "synonyms"
            ]
          }
        },
        "filter": {
          "synonyms": {
            "type": "synonym",
            "synonyms_path": "synonyms.txt"
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword",
        "fields": {
          "analyzer_exact": {
            "type": "text",
            "analyzer": "analyzer_exact"
          }
        }
      }
    }
  }
}

And I fill it with these items:

POST exact_search/_doc/1
{
  "name": "Hoodie Call of Duty"
}
POST exact_search/_doc/2
{
  "name": "Call of Duty 2"
}
POST exact_search/_doc/3
{
  "name": "Call of Duty: Modern Warfare 2"
}
POST exact_search/_doc/4
{
  "name": "COD: Modern Warfare 2"
}
POST exact_search/_doc/5
{
  "name": "Call of duty"
}
POST exact_search/_doc/6
{
  "name": "Call of the sea"
}
POST exact_search/_doc/7
{
  "name": "Heavy Duty"
}

synonyms.txt looks like this:

cod,call of duty

And what I am trying to achieve is, to get all the results (exept call of the sea and heavy duty) when I search "call of duty" or "cod".

So far, I constructed this query, but it does not work as expected when using "cod" search term (term "call of duty" works fine):

GET exact_search/_search
{
  "explain": false, 
  "query":{
    "bool":{
      "must":[
         {
           "wildcard": {
             "name.analyzer_exact": {
               "value": "*cod*"
             }
           }
         }
      ]
    }
  }
}

But the result is only two items:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "exact_search",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "name" : "COD: Modern Warfare 2"
        }
      },
      {
        "_index" : "exact_search",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "name" : "Call of duty"
        }
      }
    ]
  }
}

It looks like that the synonyms are working, because it returns "call of duty" game, but it ignores the wildcards - it won't return Call of Duty 2 for example.

I need to look for the exact phrase match, because I dont't want to get results Heavy Duty or Call of the sea (when words "call" and "duty" match).

Thank you for pointing me in the right direction.

CodePudding user response:

I have my doubts if the analyzer would generate the tokens synonymous with the analyzer_exact "tokenizer": "keyword". I would change a few things to make it work.

  1. keyword -> standard

      "analyzer_exact": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "synonyms"
        ]
      }
    
  2. I would use match phrase to eliminate names other than call of duty and cod.

     {
       "match_phrase": {
         "name.analyzer_exact": "cod"
       }
     }
    

Response after changes

{
  "hits": {
    "hits": [
      {
        "_source": {
          "name": "Call of duty"
        }
      },
      {
        "_source": {
          "name": "COD: Modern Warfare 2"
        }
      },
      {
        "_source": {
          "name": "Call of Duty 2"
        }
      },
      {
        "_source": {
          "name": "hoddies Call of Duty"
        }
      },
      {
        "_source": {
          "name": "Call of Duty: Modern Warfare 2"
        }
      }
    ]
  }
  • Related