Each of my records in Elasticsearch has an array of objects that looks like this:
{
"counts_by_year": [
{
"year": 2022,
"works_count": 22523,
"cited_by_count": 18054
},
{
"year": 2021,
"works_count": 32059,
"cited_by_count": 24817
},
{
"year": 2020,
"works_count": 27210,
"cited_by_count": 30238
},
{
"year": 2019,
"works_count": 22592,
"cited_by_count": 33631
}
]
}
What I want to do is sort my records using the average of works_count where year is 2022 and year is 2021. Is this a case where I could use script based sorting? Or should I try to copy those values into a separate field and sort on that?
Edit - the mapping is:
{
"mappings": {
"_doc": {
"properties": {
"@timestamp": {
"type": "date"
},
.
.
.
"counts_by_year": {
"properties": {
"cited_by_count": {
"type": "integer"
},
"works_count": {
"type": "integer"
},
"year": {
"type": "integer"
}
}
},
.
.
.
}
}
}
}
CodePudding user response:
Tldr;
It depends.
Most likely yes, except if count_by_year
is nested.
Solution
Something along those lines should do the trick
GET /_search
{
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": "doc['counts_by_year.works_count'].stream().mapToLong(x -> x).average().orElse(0);"
}
}
}
}
Solution (nested fields)
PUT 74404793-2
{
"mappings": {
"properties": {
"counts_by_year": {
"type": "nested",
"properties": {
"cited_by_count": {
"type": "long"
},
"works_count": {
"type": "long"
},
"year": {
"type": "long"
}
}
}
}
}
}
POST /74404793-2/_doc/
{
"counts_by_year": [
{
"year": 2022,
"works_count": 22523,
"cited_by_count": 18054
},
{
"year": 2021,
"works_count": 32059,
"cited_by_count": 24817
},
{
"year": 2020,
"works_count": 27210,
"cited_by_count": 30238
},
{
"year": 2019,
"works_count": 22592,
"cited_by_count": 33631
}
]
}
I am using the _source
to access the documents, it can severely impact the performances if you have big documents.
GET 74404793-2/_search
{
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": """
params._source['counts_by_year']
.stream()
.filter(x -> x['year'] > 2020)
.mapToLong(x -> x['works_count'])
.average().orElse(0);"""
}
}
}
}