Elasticsearch Aggregation most common list of integers-CodePudding

I am looking for elastic search aggregation mapping that will return the most common list for a certain field. For example for docs: {"ToneCurvePV2012": [1,2,3]} {"ToneCurvePV2012": [1,5,6]} {"ToneCurvePV2012": [1,7,8]} {"ToneCurvePV2012": [1,2,3]}

I wish for the aggregation result: [1,2,3] (since it appears twice).

so far any aggregation that i made would return: 1

CodePudding user response：

This is not possible with default terms aggregation. You need to use terms aggregation with script. Please note that this might impact your cluster performance.

Here, i have used script which will create string from array and used it for aggregation. so if you have array value like [1,2,3] then it will create string representation of it like '[1,2,3]' and that key will be used for aggregation.

Below is sample query you can use to generate aggregation as you expected:

POST index1/_search
{
  "size": 0,
  "aggs": {
    "tone_s": {
      "terms": {
      "script": {
        "source": "def value='['; for(int i=0;i<doc['ToneCurvePV2012'].length;i  ){value= value   doc['ToneCurvePV2012'][i]   ',';} value = ']'; value = value.replace(',]', ']'); return value;"
      }
      }
    }
  }
}

Output:

{
 "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "tone_s" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "[1,2,3]",
          "doc_count" : 2
        },
        {
          "key" : "[1,5,6]",
          "doc_count" : 1
        },
        {
          "key" : "[1,7,8]",
          "doc_count" : 1
        }
      ]
    }
  }
}

PS: key will be come as string and not as array in aggregation response.