Home > Back-end >  Elasticsearch, date histogram omits buckets
Elasticsearch, date histogram omits buckets

Time:08-02

I am trying to get a result like the kibana "discover" tab like below

enter image description here

via date_histogram functionality

my request is as below

GET index-*/_search
{
  "size": 0,
  "aggs": {
    "stats": {
      "date_histogram": {
        "min_doc_count": 0,
        "missing": 0, 
        "time_zone": " 03:00",
        "field": "@timestamp",
        "fixed_interval": "1h",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  },
    "query": {
    "bool": {
      "filter": {
        "range": {
          "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2022-08-01T11:00:00.000Z",
              "lte": "2022-08-02T11:14:34.158Z"
          }
        }
      }
    }
  }
}

resulting in correct query verifying from the number of total hits

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 782,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "stats": {
      "buckets": [
        {
          "key_as_string": "2022-08-01 14:00:00",
          "key": 1659351600000,
          "doc_count": 450
        },
        {
          "key_as_string": "2022-08-01 15:00:00",
          "key": 1659355200000,
          "doc_count": 265
        },
...................................
        {
          "key_as_string": "2022-08-02 09:00:00",
          "key": 1659420000000,
          "doc_count": 0
        },
        {
          "key_as_string": "2022-08-02 10:00:00",
          "key": 1659423600000,
          "doc_count": 31
        },
        {
          "key_as_string": "2022-08-02 11:00:00",
          "key": 1659427200000,
          "doc_count": 0
        },
        {
          "key_as_string": "2022-08-02 12:00:00",
          "key": 1659430800000,
          "doc_count": 1
        }
      ]
    }
  }
}

the problem is that it does not return empty buckets that are in the end even if min_doc_count is set to 0. The number of buckets is 23 not 24 and only the last non-empty bucket is returned

If the last bucket is not empty then 24 are shown correctly. The in-between empty buckets are shown correctly to 0.

how can I fix my missing buckets ? Maybe its not possible with date_histogram ?

Thanks

CodePudding user response:

You should try extended_bounds with the same date range as in your range filter in order to include the first or last buckets, even if they are empty:

GET index-*/_search
{
  "size": 0,
  "aggs": {
    "stats": {
      "date_histogram": {
        "min_doc_count": 0,
        "missing": 0, 
        "time_zone": " 03:00",
        "field": "@timestamp",
        "fixed_interval": "1h",
        "extended_bounds": {
          "min": "2022-08-01T11:00:00.000Z",
          "max": "2022-08-02T11:14:34.158Z"
        }
      }
    }
  },
    "query": {
    "bool": {
      "filter": {
        "range": {
          "@timestamp": {
              "format": "strict_date_optional_time",
              "gte": "2022-08-01T11:00:00.000Z",
              "lte": "2022-08-02T11:14:34.158Z"
          }
        }
      }
    }
  }
}

CodePudding user response:

Explanation of the behavior mentioned in the question

Histogram buckets are created based on the dates returned in the documents being aggregated. So first bucket is based on the earliest date in documents returned and last bucket on the latest date. Then buckets in between are created(missing are assigned 0 value).

As @Val mentioned you can use extended_bounds setting, which enables extending the bounds of the histogram beyond the data itself.

  • Related