I need to sort based on two logical part in script. For each document, min value ( HQ and offices distance from given distance) is calculated and returned for sorting. Since I need to return only 1 value, I need to combine those scripts that calculate distance between hq and given location as well as multiple offices and given location.
I tried to combine those but Offices is nested property and Headquarter is non-nested property. If I use "NestedPath", somehow I am not able to access Headquarter property. Without "NestedPath", I am not able to use Offices property. here is the mapping :
"offices" : {
"type" : "nested",
"properties" : {
"coordinates" : {
"type" : "geo_point",
"fields" : {
"raw" : {
"type" : "text",
"index" : false
}
},
"ignore_malformed" : true
},
"state" : {
"type" : "text"
}
}
},
"headquarters" : {
"properties" : {
"coordinates" : {
"type" : "geo_point",
"fields" : {
"raw" : {
"type" : "text",
"index" : false
}
},
"ignore_malformed" : true
},
"state" : {
"type" : "text"
}
}
}
And here is the script that I tried :
"sort": [
{
"_script": {
"nested" : {
"path" : "offices"
},
"order": "asc",
"script": {
"lang": "painless",
"params": {
"lat": 28.9672,
"lon": -98.4786
},
"source": "def hqDistance = 1000000;if (!doc['headquarters.coordinates'].empty){hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;} def officeDistance= doc['offices.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371; if (hqDistance < officeDistance) { return hqDistance; } return officeDistance;"
},
"type": "Number"
}
}
],
When I run the script, Headquarters logic is not even executed it seems, I get results only based on offices distance.
Can someone help me with this ? Thank you.
CodePudding user response:
Nested
fields operate in a separate context and their content cannot be accessed from the outer level, nor vice versa.
You can, however, access a document's raw _source
.
But there's a catch:
- See, when iterating under the
offices
nested path, you were able to call.arcDistance
because thecoordinates
are of typeScriptDocValues.GeoPoint
. - But once you access the raw
_source
, you'll be dealing with an unoptimized set ofjava.util.ArrayList
s andjava.util.HashMap
s.
This means that even though you can iterate an array list:
...
for (def office : params._source['offices']) {
// office.coordinates is a trivial HashMap of {lat, lon}!
}
calculating geo distances won't be directly possible…
…unless you write your own geoDistance
function -- which is perfectly fine with Painless
, but it'll need to be defined at the top of a script.
No need to reinvent the wheel though: Calculating distance between two points, using latitude longitude?
A sample implementation
Assuming your documents look like this:
POST my-index/_doc
{
"offices": [
{
"coordinates": "39.9,-74.92",
"state": "New Jersey"
}
],
"headquarters": {
"coordinates": {
"lat": 40.7128,
"lon": -74.006
},
"state": "NYC"
}
}
your sorting script could look like this:
GET my-index/_search
{
"sort": [
{
"_script": {
"order": "asc",
"script": {
"lang": "painless",
"params": {
"lat": 28.9672,
"lon": -98.4786
},
"source": """
// We can declare functions at the beginning of a Painless script
// https://www.elastic.co/guide/en/elasticsearch/painless/current/painless-functions.html#painless-functions
double deg2rad(double deg) {
return (deg * Math.PI / 180.0);
}
double rad2deg(double rad) {
return (rad * 180.0 / Math.PI);
}
// https://stackoverflow.com/a/3694410/8160318
double geoDistanceInMiles(def lat1, def lon1, def lat2, def lon2) {
double theta = lon1 - lon2;
double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta));
dist = Math.acos(dist);
dist = rad2deg(dist);
return dist * 60 * 1.1515;
}
// start off arbitrarily high
def hqDistance = 1000000;
if (!doc['headquarters.coordinates'].empty) {
hqDistance = doc['headquarters.coordinates'].arcDistance(params.lat, params.lon) * 0.000621371;
}
// assume office distance as large as hq distance
def officeDistance = hqDistance;
// iterate each office and compar it to the currently nowest officeDistance
for (def office : params._source['offices']) {
// the coordinates are formatted as "lat,lon" so let's split...
def latLong = Arrays.asList(office.coordinates.splitOnToken(","));
// ...and parse them before passing onwards
def tmpOfficeDistance = geoDistanceInMiles(Float.parseFloat(latLong[0]),
Float.parseFloat(latLong[1]),
params.lat,
params.lon);
// we're interested in the nearest office...
if (tmpOfficeDistance < officeDistance) {
officeDistance = tmpOfficeDistance;
}
}
if (hqDistance < officeDistance) {
return hqDistance;
}
return officeDistance;
"""
},
"type": "Number"
}
}
]
}
Shameless plug: I dive deep into Elasticsearch scripting in a dedicated chapter of my ES Handbook.