Following is the data stored in ElasticSearch:
[
{
"id": 1,
"class": "class 1",
"name": "Scott",
"scores": [
{ "year": 2022, "score": 100 },
{ "year": 2011, "score": 80 },
{ "year": 2003, "score": 70 }
]
},
{
"id": 2,
"class": "class 1",
"name": "Gabriel",
"scores": [
{ "year": 2015, "score": 90 },
{ "year": 2011, "score": 70 }
]
},
{
"id": 3,
"class": "class 2",
"name": "Scott",
"scores": [
{ "year": 2022, "score": 100 },
{ "year": 2021, "score": 100 },
{ "year": 2003, "score": 80 }
]
},
{
"id": 4,
"class": "class 2",
"name": "Pierce",
"scores": [
{ "year": 2022, "score": 70 }
]
}
]
Is there a way in ElasticSearch to merge/combine scores into one array by specific group? (keep the duplicates values)
For example:
- Group by class, it will show the scores of class 1 and class 2, and just keep the class and scores fields, the result would be:
[
{
"class": "class 1",
"scores": [
{ "year": 2022, "score": 100 },
{ "year": 2015, "score": 90 },
{ "year": 2011, "score": 80 },
{ "year": 2011, "score": 70 },
{ "year": 2003, "score": 70 }
]
},
{
"class": "class 2",
"scores": [
{ "year": 2022, "score": 100 },
{ "year": 2022, "score": 70 },
{ "year": 2021, "score": 100 },
{ "year": 2003, "score": 80 }
]
}
]
- Group by name, it will put all the scores of Scott into one array, and keep only name and scores fields:
[
{
"name": "Scott",
"scores": [
{ "year": 2022, "score": 100 },
{ "year": 2022, "score": 100 },
{ "year": 2021, "score": 100 },
{ "year": 2011, "score": 80 },
{ "year": 2003, "score": 80 },
{ "year": 2003, "score": 70 }
]
},
{
"name": "Gabriel",
"scores": [
{ "year": 2015, "score": 90 },
{ "year": 2011, "score": 70 }
]
},
{
"name": "Pierce",
"scores": [
{ "year": 2022, "score": 70 }
]
}
]
Thanks!
CodePudding user response:
Disclaimer: Tighten your seatbelt it is going to be verbose ^^
TLDR;
Yes, it is possible, using term aggregation.
Such as, grouping by class:
You will find a bucket called byClass
of type term
.
Elastic will create bucket of documents per value in the field class.
-> class 1
and class 2
But as you will notice it create more aggregation in this aggregation.
-> nestedAGG
, byyear
and bynotes
The first one is a specificity of Elastic.
The 2 other further break down the bucket by year then notes respectively.
GET /71128503/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"byClass": {
"terms": {
"field": "class",
"size": 10
},
"aggs": {
"nestedAGG": {
"nested": {
"path": "scores"
},
"aggs": {
"byyear": {
"terms": {
"field": "scores.year",
"size": 10
},
"aggs": {
"bynotes": {
"terms": {
"field": "scores.score",
"size": 10
}
}
}
}
}
}
}
}
}
}
{
...
"aggregations" : {
"byClass" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "class 1",
"doc_count" : 2,
"nestedAGG" : {
"doc_count" : 5,
"byyear" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 2011,
"doc_count" : 2,
"bynotes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 70,
"doc_count" : 1
},
{
"key" : 80,
"doc_count" : 1
}
]
}
},
{
"key" : 2003,
"doc_count" : 1,
"bynotes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 70,
"doc_count" : 1
}
]
}
},
{
"key" : 2015,
"doc_count" : 1,
"bynotes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 90,
"doc_count" : 1
}
]
}
},
{
"key" : 2022,
"doc_count" : 1,
"bynotes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 100,
"doc_count" : 1
}
]
}
}
]
}
}
},
{
"key" : "class 2",
"doc_count" : 2,
"nestedAGG" : {
"doc_count" : 4,
"byyear" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 2022,
"doc_count" : 2,
"bynotes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 70,
"doc_count" : 1
},
{
"key" : 100,
"doc_count" : 1
}
]
}
},
{
"key" : 2003,
"doc_count" : 1,
"bynotes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 80,
"doc_count" : 1
}
]
}
},
{
"key" : 2021,
"doc_count" : 1,
"bynotes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 100,
"doc_count" : 1
}
]
}
}
]
}
}
}
]
}
}
}
To reproduce
To ingest the data. Notice the custom mapping:
- keyword (grouping by term should not be done on text type fields)
- Nested (Elastic by default flatten objects)
PUT /71128503/
{
"settings": {},
"mappings": {
"properties": {
"class": {
"type": "keyword"
},
"name":{
"type": "keyword"
},
"scores":{
"type": "nested",
"properties": {
"score": {
"type": "integer"
},
"year": {
"type": "integer"
}
}
}
}
}
}
POST /_bulk
{"index":{"_index":"71128503","_id":1}}
{"class":"class 1","name":"Scott","scores":[{"year":2022,"score":100},{"year":2011,"score":80},{"year":2003,"score":70}]}
{"index":{"_index":"71128503","_id":2}}
{"class":"class 1","name":"Gabriel","scores":[{"year":2015,"score":90},{"year":2011,"score":70}]}
{"index":{"_index":"71128503","_id":3}}
{"class":"class 2","name":"Scott","scores":[{"year":2022,"score":100},{"year":2021,"score":100},{"year":2003,"score":80}]}
{"index":{"_index":"71128503","_id":4}}
{"class":"class 2","name":"Pierce","scores":[{"year":2022,"score":70}]}
Then to query the data:
By class / By name
GET /71128503/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"byName": { <- Name of your bucket
"terms": { <- Type of grouping, Elastic support many, like sum, avg on numeric value ....
"field": "name", <- Field you grouping on
"size": 10
},
"aggs": {
"nestedAGG": {
"nested": {
"path": "scores"
},
"aggs": {
"byyear": {
"terms": {
"field": "scores.year",
"size": 10
},
"aggs": {
"bynotes": {
"terms": {
"field": "scores.score",
"size": 10
}
}
}
}
}
}
}
}
}
}