Home > Software engineering >  How to match multiword terms?
How to match multiword terms?

Time:04-03

I need to create some more complex queries. Besides matching some random text in html field I need to match at least one of the keywords from the list. It works when the list contains one word strings only, but it doesn't register multiword strings in the list. I can't split them into separate words because that could affect the results.

This is what I tried so far.

{
   "from":page_num * size - size,
   "size": size,
   "query":{
      "bool":{
         "must":[
            {
               "match":{
                  "html":{
                     "query":"some query",
                     "operator":"and"
                  }
               }
            },
            {
               "terms":{
                  "keywords":[
                     "word",
                     "two words",
                     "another words"
                  ]
               }
            }
         ]
      }
   }
}

CodePudding user response:

If you have not explicitly defined any mapping then you need to add .keyword to the keywords field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after keywords field).

{
   "from":page_num * size - size,
   "size": size,
   "query":{
      "bool":{
         "must":[
            {
               "match":{
                  "html":{
                     "query":"some query",
                     "operator":"and"
                  }
               }
            },
            {
               "terms":{
                  "keywords.keyword":[
                     "word",
                     "two words",
                     "another words"
                  ]
               }
            }
         ]
      }
   }
}

However if you want to store keywords field as of both text and keyword type, then you can update your index mapping as shown below to use multi fields

PUT /_mapping
{
  "properties": {
    "keywords": {
      "type": "keyword",
      "fields": {
        "raw": {
          "type": "text"
        }
      }
    }
  }
}

And then reindex the data again.

  • Related