Home > Blockchain >  How to count aggregated fields in Kibana Elasticsearch/opensearch query DSL?
How to count aggregated fields in Kibana Elasticsearch/opensearch query DSL?

Time:11-30

I am using Kibana to view an Opensearch index with objects like:

timestamp:"November 3rd 2022, 23:50:51.253" client_id:"61c9aebdd01d" event:"login"
timestamp:"November 3rd 2022, 23:51:11.553" client_id:"61c9aebdd01d" event:"error"
timestamp:"November 3rd 2022, 23:52:19.982" client_id:"287a5ef458db" event:"login"
timestamp:"November 3rd 2022, 23:59:35.840" client_id:"61c9aebdd01d" event:"login"

I'd like to count unique client_ids with event "login"; so, using the data above, the count would be 2.

I'm able to count events matching "login" using AWS's Kibana interface, with the query DSL:

{
  "query": {
    "match": {
      "event": "login"
    }
  }
}

...that works fine, and produces the count 3.

But when I try to construct an aggregation per various documentation, like:

{
  "size": 0,
  "aggs": {
    "client_count": {
      "cardinality": {
        "field": "client_id"
      }
    }
  }
}

...I get a SearchError: Internal Server Error.

I've tried various variations on this. For instance, this works without an error:

{
  "size": 0,
  "query": {
    "match": {
      "event": "login"
    }
  },
  "aggs": {
    "client_count": {
      "cardinality": {
        "field": "client_id",
        "size": 0
      }
    }
  }
}

...but, it doesn't seem to actually report the count of unique client_ids, it just produces the same exact results as the first query above (that matches all "login" events).

Aggregation types "cardinality", "terms" and "global" all seem to produce the same error.

Any ideas what syntax I should be using?


P.S.: I looked at about 30 other Elasticsearch query questions, but none seemed to answer this question

P.P.S: I can't use syntax like

GET /my_index_here/_search
{
...

because it's not allowed in the Kibana interface: enter image description here

CodePudding user response:

Your query looks good. So your issue is related with how are you querying the data and not the query itself.

This is the step-by-step:

Mappings:

PUT test_cardinality
{
  "mappings": {
    "properties": {
      "client_id": {
        "type": "keyword"
      },
      "event": {
        "type": "keyword"
      }
    }
  }
}

Documents:

POST test_cardinality/_doc
{
  "client_id": "61c9aebdd01d",
  "event": "login"
}

POST test_cardinality/_doc
{
  "client_id": "287a5ef458db",
  "event": "login"
}

Query

GET test_cardinality/_search
{
  "size": 0,
  "aggs": {
    "client_count": {
      "cardinality": {
        "field": "client_id"
      }
    }
  }
}

Results:

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "client_count" : {
      "value" : 2
    }
  }
}

CodePudding user response:

The answer was:

  1. Use the Dev Tools interface in Kibana, NOT the Discover interface, which seems to not work for some queries for reasons I don't understand (maybe the result objects don't work for the default visualization?)

  2. Full query code:

GET /MY-INDEX-HERE/_search
{
  "size": 0,
  "query": {
    "match": {
      "event": "login"
    }
  },
  "aggs": {
    "client_count": {
      "cardinality": {
        "field": "client_id"
      }
    }
  }
}

And FYI, I ended up using this code, to isolate a particular period (the full year 2019):

GET /MY-INDEX-HERE/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "timestamp": {
              "gte": "1546300800000",
              "lt": "1577836800000"
            }
          }
        },
        {
          "match": {
            "event": "login"
          }
        }
      ]
    }
  },
  "aggs": {
    "client_count": {
      "cardinality": {
        "field": "client_id"
      }
    }
  }
}
  • Related