I am using Java and Spring data, Elasticsearch 6.8.14 Api. to communicate with Elasticsearch. I have index that returns such data (I am including this search result to show the mapping structure also)
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "rgt",
"_type" : "carindexeddata",
"_id" : "6020354",
"_score" : 1.0,
"_source" : {
"id" : "4441",
"version" : null,
"carId" : "1263",
"mark" : "ford",
"colour" : "green",
"status" : "Approved",
......
So basically I store cars. Now I need to sort them before returning to the user. I have to sort it:
- mark
- colour (within same mark colours are important)
- status
And as for the status the sort order should be as follows:
1. BOUGHT
2. IN PRODUCTION
3. IN TESTS
4. APPROVED
So having such cars order would be OK:
1. Ford Black Bought
2. Ford Black Approved
3. Ford White Bought
4. GMC White Bought
5. GMC White Approved
Which mechanism in Elasticsearch could I use to sort items that way ? Is it possible to implement? Can u show some example ? Sorting by fields mark, colour, status is not correct because there is some custom logic in status sorting - it is not letter sorting but some weight sorting I would say.. but how to give specific weights for specific statuses in elasticesearch? Should I store a field with some number for each status in Elastic search and sort according to this number field instead status field directly ?
CodePudding user response:
For status field, you can use script sort
{
"sort": [
{
"mark.keyword": "asc",
"colour.keyword": "asc",
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": """
if(doc['status.keyword'].value.toUpperCase()=="BOUGHT")
return 1;
else if(doc['status.keyword'].value.toUpperCase()=="IN PRODUCTION")
return 2;
else if(doc['status.keyword'].value.toUpperCase()=="IN TESTS")
return 3;
else return 4;
"""
},
"order": "asc"
}
}
]
}
Scripts are slow.
Elastic search works best when data is preprocessed. If you can have a numeric field which represents status value, performance will be better. You need to check it out what works best for your case.