Home > Net >  Get count distinct by nested objects from Elasticsearch
Get count distinct by nested objects from Elasticsearch

Time:05-07

I have index with following mapping

{
    "mappings": {
        "properties": {
            "typed_obj": {
                "type": "nested",
                "properties": {
                    "id": {"type": "keyword"},
                    "type": {"type": "keyword"}
                }
            }
        }
    }
}

and documents

{"index" : {}}
{"typed_obj": [{"id": "1", "type": "one"}, {"id": "2", "type": "two"}]}
{"index" : {}}
{"typed_obj": [{"id": "1", "type": "one"}, {"id": "2", "type": "one"}]}
{"index" : {}}
{"typed_obj": [{"id": "1", "type": "one"}, {"id": "3", "type": "one"}]}
{"index" : {}}
{"typed_obj": [{"id": "1", "type": "one"}, {"id": "4", "type": "two"}]}

How can i group typed_obj by type and calculate unique id ? Smth like

{
 "type": "one",
 "count": 3
},
{
 "type": "two",
 "count": 2
}

I made up query with agg

{
    "query": {
        "match_all": {}
    },
    "aggs": {
        "obj_nested": {
            "nested": {
                "path": "typed_obj"
            },
            "aggs": {
                "by_type_and_id": {
                    "multi_terms": {
                        "terms": [
                            {
                                "field": "typed_obj.type"
                            },
                            {
                                "field": "typed_obj.id"
                            }
                        ]
                    }
                }
            }
        }
    },
    "size": 0
}

and it returns

"buckets": [
                    {
                        "key": [
                            "one",
                            "1"
                        ],
                        "key_as_string": "one|1",
                        "doc_count": 4
                    },
                    {
                        "key": [
                            "one",
                            "2"
                        ],
                        "key_as_string": "one|2",
                        "doc_count": 1
                    },
                    {
                        "key": [
                            "one",
                            "3"
                        ],
                        "key_as_string": "one|3",
                        "doc_count": 1
                    },
                    {
                        "key": [
                            "two",
                            "2"
                        ],
                        "key_as_string": "two|2",
                        "doc_count": 1
                    },
                    {
                        "key": [
                            "two",
                            "4"
                        ],
                        "key_as_string": "two|4",
                        "doc_count": 1
                    }
                ]

In backend app i can group keys by first element (it is typed_obj type) and then retriev length, but my question is - is it possible to get types count without obtain from index all id type pairs ?

CodePudding user response:

You need to use Cardinality aggregation to count distinct values.

Query:

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "obj_nested": {
      "nested": {
        "path": "typed_obj"
      },
      "aggs": {
        "type":{
          "terms": {
            "field": "typed_obj.type",
            "size": 10
          },
          "aggs": {
            "id": {
              "cardinality": {
                "field": "typed_obj.id"
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}

Response

"aggregations" : {
    "obj_nested" : {
      "doc_count" : 8,
      "type" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "one",
            "doc_count" : 6,
            "id" : {
              "value" : 3
            }
          },
          {
            "key" : "two",
            "doc_count" : 2,
            "id" : {
              "value" : 2
            }
          }
        ]
      }
    }
  }

Note:

A single-value metrics aggregation that calculates an approximate count of distinct values.

  • Related