Home > Enterprise >  How do I sort using the best matching nested field or a default in Elasticsearch?
How do I sort using the best matching nested field or a default in Elasticsearch?

Time:09-22

I have a bunch of documents that look like this in my index:

{
    "given_name":"John",
    "family_name":"Smith",
    "email_addresses": [
        {
          "email_address":"[email protected]",
          "primary":true
        },
        {
          "email_address":"[email protected]",
          "primary":false
        },
         {
          "email_address":"[email protected]",
          "primary":false
        },
         {
          "email_address":"[email protected]",
          "primary":false
        }
      ]
}

The mapping looks like this:

{
   "mappings":{
      "properties":{
         "given_name":{
            "type":"keyword",
            "fields":{
               "search":{
                  "type":"search_as_you_type"
               }
            }
         },
         "family_name":{
            "type":"keyword",
            "fields":{
               "search":{
                  "type":"search_as_you_type"
               }
            }
         },
         "email_addresses":{
            "type":"nested",
            "properties":{
               "email_address":{
                  "type":"keyword",
                  "fields":{
                     "search":{
                        "type":"search_as_you_type"
                     }
                  }
               },
               "primary":{
                  "type":"boolean"
               }
            }
         }
      }
   }
}

I am running a prefix search on given_name, family_name and email_addresses. This will allow the user to start typing and relevant results from those fields should start returning:

{
   "query":{
      "bool":{
         "should":[
            {
               "nested":{
                  "path":"email_addresses",
                  "query":{
                     "prefix":{
                        "email_addresses.email_address.search": {
                          "value":"j"
                        }
                     }
                  }
               }
            },
            {
               "multi_match":{
                  "query":"j",
                  "fields":[
                     "given_name.search",
                     "family_name.search"
                  ],
                  "type": "bool_prefix"
               }
            }
         ]
      }
   }
}

I'd like to sort the results from the above by the best matching email_address in email_addresses if there is one or more matching email_address under email_addresses, otherwise to use the email_address under email_addresses where primary is true.

I have looked into a script for sorting, but I didn't find anyway to access the matched nested child in a script in the documentation.

Is there anyway to achieve this?

CodePudding user response:

To do this, we can use a bool query in the nested sort.

Given we have the following 4 documents:

{
    "given_name":"John",
    "family_name":"Smith1",
    "email_addresses": [
        {
          "email_address":"[email protected]",
          "primary":true
        },
        {
          "email_address":"[email protected]",
          "primary":false
        },
         {
          "email_address":"[email protected]",
          "primary":false
        },
         {
          "email_address":"someguy53gmail.com",
          "primary":false
        }
      ]
}


{
    "given_name":"John",
    "family_name":"Smith2",
    "email_addresses": [
        {
          "email_address":"[email protected]",
          "primary":true
        },
        {
          "email_address":"[email protected]",
          "primary":false
        },
         {
          "email_address":"[email protected]",
          "primary":false
        },
         {
          "email_address":"someguy56gmail.com",
          "primary":false
        }
      ]
}

{
    "given_name":"John",
    "family_name":"Smith3",
    "email_addresses": [
        {
          "email_address":"[email protected]",
          "primary":true
        },
        {
          "email_address":"[email protected]",
          "primary":false
        },
         {
          "email_address":"[email protected]",
          "primary":false
        },
         {
          "email_address":"someguy46gmail.com",
          "primary":false
        }
      ]
}

{
    "given_name":"John",
    "family_name":"Smith4",
    "email_addresses": [
        {
          "email_address":"[email protected]",
          "primary":true
        },
        {
          "email_address":"[email protected]",
          "primary":false
        },
         {
          "email_address":"[email protected]",
          "primary":false
        },
         {
          "email_address":"someguy42gmail.com",
          "primary":false
        }
      ]
}

We can write our query like so:

{
   "query":{
      "bool":{
         "should":[
            {
               "nested":{
                  "path":"email_addresses",
                  "query":{
                     "prefix":{
                        "email_addresses.email_address.search":{
                           "value":"john"
                        }
                     }
                  }
               }
            },
            {
               "multi_match":{
                  "query":"john",
                  "fields":[
                     "given_name.search",
                     "family_name.search"
                  ],
                  "type":"bool_prefix"
               }
            }
         ]
      }
   },
   "sort":[
      {
         "email_addresses.email_address":{
           "order" : "asc",
            "nested":{
               "path":"email_addresses",
               "filter":{
                  "bool":{
                     "should":[
                        {
                           "prefix":{
                              "email_addresses.email_address.search":{
                                 "value":"john"
                              }
                           }
                        },
                        {
                           "term":{
                              "email_addresses.primary": true
                           }
                        }
                     ]
                  }
               }
            }
         }
      }
   ]
}

First we do a prefix search on the email_addresses.email_address, given_name and family_name.

Then we sort on the nested email_addresses field as follows:

  • Sort by the email_addresses.email_address that matches our query.
  • Sort by email_address.primary = true.

The way this works is that in the bool query, Elasticsearch will first find documents that matches the first query under should and sort those documents. For the remaining documents that do not match, it will proceed to the next query, which in our case is email_address.primary = true. If there are more documents that do not match either of these queries, they will be ordered using an order predetermined by Elasticsearch.

  • Related