Home > front end >  Using NestedPath in Script Sort Elastic Search doesn't allow accessing outer properties
Using NestedPath in Script Sort Elastic Search doesn't allow accessing outer properties

Time:04-12

I need to sort based on two logical part in script. For each document, min value ( HQ and offices distance from given distance) is calculated and returned for sorting. Since I need to return only 1 value, I need to combine those scripts that calculate distance between hq and given location as well as multiple offices and given location.

I tried to combine those but Offices is nested property and Headquarter is non-nested property. If I use "NestedPath", somehow I am not able to access Headquarter property. Without "NestedPath", I am not able to use Offices property. here is the mapping :

         "offices" : {
            "type" : "nested",
            "properties" : {
              "coordinates" : {
                "type" : "geo_point",
                "fields" : {
                  "raw" : {
                    "type" : "text",
                    "index" : false
                  }
                },
                "ignore_malformed" : true
              },
              "state" : {
                "type" : "text"
              }
            }
          },
        "headquarters" : {
            "properties" : {
              "coordinates" : {
                "type" : "geo_point",
                "fields" : {
                  "raw" : {
                    "type" : "text",
                    "index" : false
                  }
                },
                "ignore_malformed" : true
              },
              "state" : {
                "type" : "text"
              }
            }
          }

And here is the script that I tried :

 "sort": [
    {
      "_script": {
        "nested" : {
          "path" : "offices"
        },
        "order": "asc",
        "script": {
          "lang": "painless",
          "params": {
            "lat": 28.9672,
            "lon": -98.4786
          },
          "source": "def hqDistance = 1000000;if (!doc['headquarters.coordinates'].empty){hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;} def officeDistance= doc['offices.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371; if (hqDistance < officeDistance) { return hqDistance; } return officeDistance;"
        },
        "type": "Number"
      }
    }
  ],

When I run the script, Headquarters logic is not even executed it seems, I get results only based on offices distance.

Can someone help me with this ? Thank you.

CodePudding user response:

Nested fields operate in a separate context and their content cannot be accessed from the outer level, nor vice versa.

You can, however, access a document's raw _source.

But there's a catch:

  • See, when iterating under the offices nested path, you were able to call .arcDistance because the coordinates are of type ScriptDocValues.GeoPoint.
  • But once you access the raw _source, you'll be dealing with an unoptimized set of java.util.ArrayLists and java.util.HashMaps.

This means that even though you can iterate an array list:

...
for (def office : params._source['offices']) {
   // office.coordinates is a trivial HashMap of {lat, lon}!
}

calculating geo distances won't be directly possible…

…unless you write your own geoDistance function -- which is perfectly fine with Painless, but it'll need to be defined at the top of a script.

No need to reinvent the wheel though: Calculating distance between two points, using latitude longitude?

A sample implementation

Assuming your documents look like this:

POST my-index/_doc
{
  "offices": [
    {
      "coordinates": "39.9,-74.92",
      "state": "New Jersey"
    }
  ],
  "headquarters": {
    "coordinates": {
      "lat": 40.7128,
      "lon": -74.006
    },
    "state": "NYC"
  }
}

your sorting script could look like this:

GET my-index/_search
{
   "sort": [
    {
      "_script": {
        "order": "asc",
        "script": {
          "lang": "painless",
          "params": {
            "lat": 28.9672,
            "lon": -98.4786
          },
          "source": """
            // We can declare functions at the beginning of a Painless script
            // https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-functions.html#painless-functions
            
            double deg2rad(double deg) {
              return (deg * Math.PI / 180.0);
            }
            
            double rad2deg(double rad) {
              return (rad * 180.0 / Math.PI);
            }
            
            // https://stackoverflow.com/a/3694410/8160318
            double geoDistanceInMiles(def lat1, def lon1, def lat2, def lon2) {
              double theta = lon1 - lon2;
              double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2))   Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta));
              dist = Math.acos(dist);
              dist = rad2deg(dist);
              return dist * 60 * 1.1515;
            }

            // start off arbitrarily high            
            def hqDistance = 1000000;

            if (!doc['headquarters.coordinates'].empty) {
              hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;
            }
            
            // assume office distance as large as hq distance
            def officeDistance = hqDistance;
            
            // iterate each office and compar it to the currently nowest officeDistance
            for (def office : params._source['offices']) {
              // the coordinates are formatted as "lat,lon" so let's split...
              def latLong = Arrays.asList(office.coordinates.splitOnToken(","));
              // ...and parse them before passing onwards
              def tmpOfficeDistance = geoDistanceInMiles(Float.parseFloat(latLong[0]),
                                                         Float.parseFloat(latLong[1]),
                                                         params.lat,
                                                         params.lon);
              // we're interested in the nearest office...
              if (tmpOfficeDistance < officeDistance) {
                officeDistance = tmpOfficeDistance;
              }
            }
            
            if (hqDistance < officeDistance) {
              return hqDistance;
            }
            
            return officeDistance;
          """
        },
        "type": "Number"
      }
    }
  ]
}

Shameless plug: I dive deep into Elasticsearch scripting in a dedicated chapter of my ES Handbook.

  • Related