Home > Software engineering >  Elasticsearch aggregation with unqiue counting
Elasticsearch aggregation with unqiue counting

Time:07-28

My documents consist of a history of orders and their state, here a minimal example:

    {
      "orderNumber" : "xyz",
      "state" : "shipping",
      "day" : "2022-07-20",
      "timestamp" : "2022-07-20T15:06:44.290Z",
    }

the state can be strings like shipping, processing, redo,...

For every possible state, I need to count the number of orders that had this state at some point during a day, without counting a state twice for the same orderNumber that day (which can happen if there is a problem and it needs to start from the beginning that same day).

My aggregation looks like this:

GET order-history/_search
{
   "aggs": {
      "countDays": {
          "terms": {
              "field": "day",
              "order": {
                "_key": "desc"
              },
              "size": 20
          },
          "aggs": {
            "countStates": {
              "terms": {
                "field": "state.keyword",
                "size": 10
              }
              
            }
          }
      }
   }
   , "size": 1
}

However, this will count a state for a given orderNumber twice if it reappears that same day. How would I prevent it from counting a state twice for each orderNumber, if it is on the same day?

CodePudding user response:

Tldr;

I don't think there is a flexible and simple solution.

But if you know in advance the number of state that exists. Maybe through another aggregation query, to get all type of state.

You could do the following

POST /_bulk
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"shipping","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"redo","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"shipping","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"bbb","state":"processing","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"bbb","state":"shipping","day":"2022-07-20"}


GET 73138766/_search
{
  "size": 0,
  "aggs": {
    "per_day": {
      "date_histogram": {
        "field": "day",
        "calendar_interval": "day"
      },
      "aggs": {
        "shipping": {
          "filter": { "term": { "state.keyword": "shipping" }
          },
          "aggs": {
            "orders": {
              "cardinality": {
                "field": "orderNumber.keyword"
              }
            }
          }
        },
        "processing": {
          "filter": { "term": { "state.keyword": "processing" }
          },
          "aggs": {
            "orders": {
              "cardinality": {
                "field": "orderNumber.keyword"
              }
            }
          }
        },
        "redo": {
          "filter": { "term": { "state.keyword": "redo" }
          },
          "aggs": {
            "orders": {
              "cardinality": {
                "field": "orderNumber.keyword"
              }
            }
          }
        }
      }
    }
  }
}

You will obtain the following results

{
  "aggregations": {
    "per_day": {
      "buckets": [
        {
          "key_as_string": "2022-07-20T00:00:00.000Z",
          "key": 1658275200000,
          "doc_count": 5,
          "shipping": {
            "doc_count": 3,
            "orders": {
              "value": 2
            }
          },
          "processing": {
            "doc_count": 1,
            "orders": {
              "value": 1
            }
          },
          "redo": {
            "doc_count": 1,
            "orders": {
              "value": 1
            }
          }
        }
      ]
    }
  }
}
  • Related