I have some documents which contain a nested field planList which is an array of objects. It's length will always be 2 or 3. And the objects always have 3 keys - planName, planType, and planId.
I want to write a query which will return a list of all planNames, along with the most frequent plan that it was present with and it's frequency.
For example, consider the following 4 documents -
{planList: [{planName: a, planType: x, planId: 1},{planName: b, planType: x, planId: 2}]}
{planList: [{planName: a, planType: x, planId: 1},{planName: b, planType: x, planId: 2},{planName: c, planType: y, planId: 3}]}
{planList: [{planName: a, planType: x, planId: 1},{planName: c, planType: y, planId: 3}]}
{planList: [{planName: d, planType: y, planId: 4},{planName: c, planType: y, planId: 3}]}
The response for this data should have the following insights -
plan A was found with plan B and plan C 2 times (draw between plan B and C)
plan B was found with plan A 2 times
plan C was found with plan A 2 times
plan D was found with plan C one time
The query to get to these docs is very simple -
"query": {
"bool": {
"must": [
{
"match": {
"event": "comparePlans"
}
}
]
}
}
Can someone please help me with the aggregations needed to get the insight I'm looking for? Thanks.
Edit: The mapping for the above index looks like this -
"planList": {
"type": "nested",
"properties": {
"planId": {
"type": "keyword"
},
"planName": {
"type": "keyword"
},
"planType": {
"type": "keyword"
}
}
},
"event": {
"type": "keyword",
"null_value": "none"
}
CodePudding user response:
As far as I'm aware, there is no perfect way to do it outside of a scripted aggregation, however, this is close. Pay attention that this aggregation also counts the appearances of a plan name with itself.
If you know that the same plan cannot appear twice in the array, you can just get this answer and then filter out the same plan name from each bucket.
Query:
{
"size": 0,
"aggs": {
"ALL_PLANS": {
"nested": {
"path": "planList"
},
"aggs": {
"GROUPED_BY_NAME": {
"terms": {
"field": "planList.planName",
"size": 10
},
"aggs": {
"REVERSE_TO_ROOT": {
"reverse_nested": {},
"aggs": {
"ALL_PLANS": {
"nested": {
"path": "planList"
},
"aggs": {
"GROUPED_BY_NAME": {
"terms": {
"field": "planList.planName",
"size": 10
}
}
}
}
}
}
}
}
}
}
}
}
Response:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ALL_PLANS" : {
"doc_count" : 9,
"GROUPED_BY_NAME" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "a",
"doc_count" : 3,
"REVERSE_TO_ROOT" : {
"doc_count" : 3,
"ALL_PLANS" : {
"doc_count" : 7,
"GROUPED_BY_NAME" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "a",
"doc_count" : 3
},
{
"key" : "b",
"doc_count" : 2
},
{
"key" : "c",
"doc_count" : 2
}
]
}
}
}
},
{
"key" : "c",
"doc_count" : 3,
"REVERSE_TO_ROOT" : {
"doc_count" : 3,
"ALL_PLANS" : {
"doc_count" : 7,
"GROUPED_BY_NAME" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "c",
"doc_count" : 3
},
{
"key" : "a",
"doc_count" : 2
},
{
"key" : "b",
"doc_count" : 1
},
{
"key" : "d",
"doc_count" : 1
}
]
}
}
}
},
{
"key" : "b",
"doc_count" : 2,
"REVERSE_TO_ROOT" : {
"doc_count" : 2,
"ALL_PLANS" : {
"doc_count" : 5,
"GROUPED_BY_NAME" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "a",
"doc_count" : 2
},
{
"key" : "b",
"doc_count" : 2
},
{
"key" : "c",
"doc_count" : 1
}
]
}
}
}
},
{
"key" : "d",
"doc_count" : 1,
"REVERSE_TO_ROOT" : {
"doc_count" : 1,
"ALL_PLANS" : {
"doc_count" : 2,
"GROUPED_BY_NAME" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "c",
"doc_count" : 1
},
{
"key" : "d",
"doc_count" : 1
}
]
}
}
}
}
]
}
}
}
}
So we can see that "a" appears with "a" 3 times (ignore this), and with "b" and "c" 2 times.