Home > database >  Elasticsearch - prevent date histogram flooring
Elasticsearch - prevent date histogram flooring

Time:09-17

I want to do a rollup based on timestamp using date_histogram and get the average of values in each time range.

{
    "aggregations": {
        "timestamp": {
            "date_histogram": {
                "field": "timestamp",
                "interval": "300s",
                "offset": 0,
                "order": {
                    "_key": "asc"
                },
                "keyed": false,
                "min_doc_count": 0
            },
            "aggregations": {
                "myAgg": {
                    "avg": {
                        "field": "value"
                    }
                }
            }
        }
    }
}

However, the problem is that event though the values(0.0 ~ 3.9) are float, the aggregation result is giving the average of floored values. (E.g 1.1 -> 1, 2.2 -> 2).

Here is a sample result.


    "hits" : [
      {
        "_index" : "originals-20210915",
        "_type" : "_doc",
        "_score" : 5.6003222,
        "_source" : {
          "instanceName" : "LI9",
          "metric" : "tps",
          "value" : 0.36666667,
          "timestamp" : "1631717705000"
        }
      },
      {
        "_index" : "originals-20210915",
        "_type" : "_doc",
        "_score" : 5.5940228,
        "_source" : {
          "instanceName" : "DN3",
          "metric" : "tps",
          "value" : 2.8333333,
          "timestamp" : "1631717705000"
        }
      },
      {
        "_index" : "originals-20210915",
        "_type" : "_doc",
        "_score" : 5.5914664,
        "_source" : {
          "instanceName" : "LI9",
          "metric" : "tps",
          "value" : 0.13333334,
          "timestamp" : "1631717700000"
        }
      },
      {
        "_index" : "originals-20210915",
        "_type" : "_doc",
        "_score" : 5.5914664,
        "_source" : {
          "instanceName" : "DN3",
          "metric" : "tps",
          "value" : 3.3,
          "timestamp" : "1631717700000"
        }
      }
    ]
  },
  "aggregations" : {
    "instanceName" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "LI9",
          "doc_count" : 2,
          "timestamp" : {
            "buckets" : [
              {
                "key_as_string" : "2021-09-15T14:55:00.000Z",
                "key" : 1631717700000,
                "doc_count" : 2,
                "myAgg" : {
                  "value" : 0.0
                }
              }
            ]
          }
        },
        {
          "key" : "DN3",
          "doc_count" : 2,
          "timestamp" : {
            "buckets" : [
              {
                "key_as_string" : "2021-09-15T14:55:00.000Z",
                "key" : 1631717700000,
                "doc_count" : 2,
                "myAgg" : {
                  "value" : 2.5
                }
              }
            ]
          }
        }
      ]

I want to know if it's possible to get the exact average.

CodePudding user response:

Base on what you share it's probably a mapping problem, your field value is in integer I think instead of long.

So data are return with the indexed value, but are in background cast to int then the average is wrongly calculated.

You cannot change a mapping from integer to long after indexing, so you need to use _reindex to reindex your data correctly.

You mapping seem to be like :

GET originals-20210915/_mapping
{
  "originals-20210915" : {
    "mappings" : {
      "properties" : {
        "value" : {
          "type" : "integer"
        }
      }
    }
  }
}

Instead of

{
  "originals-20210915" : {
    "mappings" : {
      "properties" : {
        "value" : {
          "type" : "float" // long
        }
      }
    }
  }
}
  • Related