Home > front end >  What is the best way to get the average number of documents per bucket in Elasticsearch?
What is the best way to get the average number of documents per bucket in Elasticsearch?

Time:10-23

Suppose that we are hat maker and have an Elasticsearch index where each document corresponds to the sale of one hat. Part of the sales record is the name of the store at which the hat was sold. I want to find out the number of hats sold by each store, and the average number of hats sold over all stores. The best way I have been able to figure out is this search:

GET hat_sales/_search
{
  "size": 0,
  "query": {"match_all": {}},
  "aggs": {
    "stores": {
      "terms": {
        "field": "storename",
        "size": 65536
      },
      "aggs": {
        "sales_count": {
          "cardinality": {
            "field": "_id"
          }
        }
      }
    },
    "average_sales_count": {
      "avg_bucket": {
        "buckets_path": "stores>sales_count"
      }
    }
  }
}

(Aside: I set the size to 65536 because that is the default maximum number of buckets.)

The problem with this query is that the sales_count aggregation performs a redundant calculation: each stores bucket already has a doc_count property. But how can I access this doc_count in a buckets path?

CodePudding user response:

I think this is what you are after

PUT hat_sales
{
  "mappings": {
    "properties": {
      "storename": {
        "type": "keyword"
      }
    }
  }
}

POST hat_sales/_bulk?refresh=true
{"index": {}}
{"storename": "foo"}
{"index": {}}
{"storename": "foo"}
{"index": {}}
{"storename": "bar"}
{"index": {}}
{"storename": "baz"}
{"index": {}}
{"storename": "baz"}
{"index": {}}
{"storename": "baz"}



GET hat_sales/_search
{
  "size": 0,
  "query": {"match_all": {}},
  "aggs": {
    "stores": {
      "terms": {
        "field": "storename",
        "size": 65536
      }
    },
    "average_sales_count": {
      "avg_bucket": {
        "buckets_path": "stores>_count"
      }
    }
  }
}

The path to get to the _count is stores>_count

The results look like:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "stores" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "baz",
          "doc_count" : 3
        },
        {
          "key" : "foo",
          "doc_count" : 2
        },
        {
          "key" : "bar",
          "doc_count" : 1
        }
      ]
    },
    "average_sales_count" : {
      "value" : 2.0
    }
  }
}
  • Related