My data looks like the following:
[
{
"name": "Scott",
"origin": "London",
"travel": [
{
"active": false,
"city": "Berlin",
"visited": "2020-02-01"
},
{
"active": true,
"city": "Prague",
"visited": "2020-02-15"
}
]
},
{
"name": "Lilly",
"origin": "London",
"travel": [
{
"active": true,
"city": "Scotland",
"visited": "2020-02-01"
}
]
}
]
I want to perform an aggregation where each top-level origin is a bucket, then a nested aggregation to see how many people are currently visiting each city. I therefore only care what the city is if active
is true
.
Using a filter, it will search the visited
array and return the complete array (both objects) if one has active
set to true. I do not want to include cities where active
is false.
Desired output:
{
"aggregations": {
"origin": {
"buckets": [
{
"key": "London",
"buckets": [
{
"key": "travel",
"doc_count": 2555,
"buckets": [
{
"key": "Scotland",
"doc_count": 1
},
{
"key": "Prague",
"doc_count": 1
}
]
}
]
}
]
}
}
}
Above I only have 2 counts of under the travel aggregation because only two travel objects have active set to true.
Currently, I have my aggregation set up like so:
{
"from": 0,
"aggs": {
"origin": {
"terms": {
"field": "origin"
},
"aggs": {
"travel": {
"filter": {
"term": {
"travel.active": true
}
},
"aggs": {
"city": {
"terms": {
"field": "city"
}
}
}
}
}
}
}
}
I have my top level aggregation on origin
, then a nested agg on the travel
array. Here I have a filter on travel.active = true
, then another nested agg to create buckets for each city.
In my aggregation, it's still producing Berlin
as a city, even though I'm filtering on active = true.
My guess is because it's allowing it as active: true
is true for one of the objects in the array.
How do I filter out active: false
completely from the aggregation?
CodePudding user response:
You will have to use "nested aggregation." Official documentation link for a reference
Here is an example for your query:
Mapping:
PUT /city_index
{
"mappings": {
"properties": {
"name" : { "type" : "keyword" },
"origin" : { "type" : "keyword" },
"travel": {
"type": "nested",
"properties": {
"active": {
"type": "boolean"
},
"city": {
"type": "keyword"
},
"visited" : {
"type":"date"
}
}
}
}
}
}
Insert:
PUT /city_index/_doc/1
{
"name": "Scott",
"origin" : "London",
"travel": [
{
"active": false,
"city": "Berlin",
"visited" : "2020-02-01"
},
{
"active": true,
"city": "Prague",
"visited": "2020-02-15"
}
]
}
PUT /city_index/_doc/2
{
"name": "Lilly",
"origin": "London",
"travel": [
{
"active": true,
"city": "Scotland",
"visited": "2020-02-01"
}
]
}
Query:
GET /city_index/_search
{
"size": 0,
"aggs": {
"origin": {
"terms": {
"field": "origin"
},
"aggs": {
"city": {
"nested": {
"path": "travel"
},
"aggs": {
"travel": {
"filter": {
"term": {
"travel.active": true
}
},
"aggs": {
"city": {
"terms": {
"field": "travel.city"
}
}
}
}
}
}
}
}
}
}
Output:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"origin": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "London",
"doc_count": 2,
"city": {
"doc_count": 3,
"travel": {
"doc_count": 2,
"city": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Prague",
"doc_count": 1
},
{
"key": "Scotland",
"doc_count": 1
}
]
}
}
}
}
]
}
}
}
CodePudding user response:
The suggestion @karthick is good but I add the filter in query. That way you will have a smaller amount of values in the aggregation stage.
GET idx_travel/_search
{
"size": 0,
"query": {
"nested": {
"path": "travel",
"query": {
"term": {
"travel.active": {
"value": true
}
}
}
}
},
"aggs": {
"origin": {
"terms": {
"field": "origin"
},
"aggs": {
"city": {
"nested": {
"path": "travel"
},
"aggs": {
"city": {
"terms": {
"field": "travel.city"
}
}
}
}
}
}
}
}