ElasticSearch - slow aggregations and impact on performance of other operations-CodePudding

There is an aggregation to identify duplicate records:

{
   "size": 0,
   "aggs": {
      "myfield": {
         "terms": {
            "field": "myfield.keyword",
            "size": 250,
            "min_doc_count": 2
         }
      }
   }
}

However it is missing many duplicates due to the low size. The actual cardinality is over 2 million. If size is changed to the actual size or some other much larger number, all of the duplicate documents are found, but the operation takes 5X more time to complete.

If I change the size to a larger number, should I expect slow performance or other adverse effects on other operations while this is running?

CodePudding user response：

Yes, size param is very critical in Elasticsearch aggregation performance and if you change it very big number like 10k (limit set by Elasticsearch but you can change that by changing search.max_buckets but it will surely have adverse impact not only on the aggregation you are running but on all the operation running in Elasticsearch cluster.

As you are using terms aggregation which is of bucket aggregation, you can read more