Home > Back-end >  elastic search count don't add up after using aggregation
elastic search count don't add up after using aggregation

Time:07-05

I have an ES index, and I want to count the number of distinct CONTACT ID where [Have Agreement] flag is Y and N. The flag is unique for each CONTACT. However, when I add the contact with Y flag and N flag , the total count is different from total CONTACT number.

1.Total distinct CONTACT_ID count:

POST /dashboard/_search?size=0
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "CREATED": {
              "gte": "2021-07-04T00:00:00.001Z",
              "lte": "2021-12-31T00:00:00.001Z"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "UniqueContact": {
      "cardinality": {
        "field": "CONTACT_ID.keyword"
      }
    }
  }
}

result is 27588

enter image description here

2.Distinct CONTACT_ID count for Y and N flags respectively:

POST /dashboard/_search?size=0
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "CREATED": {
              "gte": "2021-07-04T00:00:00.001Z",
              "lte": "2021-12-31T00:00:00.001Z"
            }
          }
        }
      ]
    }
  },"aggs": {
    "CVID": {
      "terms": {
        "field": "Have Agreement.keyword",
        "order": {
          "type_count": "desc"
        }
      },
      "aggs": {
        "type_count": {
          "cardinality": {
            "field": "CONTACT_ID.keyword"
          }
        }
      }
    }
  }
}

result is 2692 and 2158. They add up to 4850. enter image description here

  1. Evidence that shows the flag is unique for each contact

    POST /dashboard/_search?size=0 { "query": { "bool": { "must": [ { "range": { "CREATED": { "gte": "2021-07-04T00:00:00.001Z", "lte": "2021-12-31T00:00:00.001Z" } } } ] } },"aggs": { "CVID": { "terms": { "field": "CONTACT_ID.keyword", "order": { "type_count": "desc" } }, "aggs": { "type_count": { "cardinality": { "field": "Have Agreement.keyword" } } } } } }

enter image description here

CodePudding user response:

Results seems to be coherent, according to your example.

Keep in mind cardinality are an approximation (you can set it to win some precision)

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

  1. You have around 27588 distinct uniqueContact matching your query (cardinality is around 5% precision)

  2. Top aggregation by Y or N (Have Agreement.keyword)

In the result we can read:

16725 documents with N
11190 documents with Y
  • For the N group, you have around 2692 different uniqueContact
  • For the Y group, you have around 2158 different uniqueContact

So you have "duplicate" matching documents, we can see this in your 3) part.

  • 10 doc with 3-QV3ZBW uniqueContact
  • 10 doc with 3-QV3ZC3 uniqueContact

=> So your second request is correct, you have around 2692 distinct uniqueContact with N value (2158 for Y)

The 2692 uniqueContact are present in 16725 docs, the 2158 others refers to 11190

16725 11190 => in the 27588 - 5%

PS: Add a query term on 3-QV3ZBW for example, I think this will answer to your question with a simple example.

  • Related