Home > Net >  GroupBy on a partition then count in Opensearch: Group By on multiple fields
GroupBy on a partition then count in Opensearch: Group By on multiple fields

Time:03-11

I have the following data

{
  "companyID" : "amz",
  "companyType" : "ret",
  "employeeID" : "ty-5a62fd78e8d20ad"
},
{
  "companyID" : "amz",
  "companyType" : "ret",
  "employeeID" : "ay-5a62fd78e8d20ad"
},
{
  "companyID" : "mic",
  "companyType" : "cse",
  "employeeID" : "by-5a62fd78e8d20ad"
},
{
  "companyID" : "ggl",
  "companyType" : "cse",
  "employeeID" : "ply-5a62fd78e8d20ad"
},
{
  "companyID" : "ggl",
  "companyType" : "cse",
  "employeeID" : "wfly-5a62ad"
}

I want the following result. basically combination of values like this mic-cse,ggl-cse,amz-ret .

"agg_by_company_type" : {
 "buckets" : [
 {
    "key" : "ret",
     "doc_count" : 1
 },
 {
   "key" : "cse",
    "doc_count" : 2
 }
]

How do I do it?

I have tried the following aggregations:

"agg_by_companyID_topHits": {
  "terms": {
    "field": "companyID.keyword",
    "size": 100000, 
    "min_doc_count": 1,
    "shard_min_doc_count": 0,
    "show_term_doc_count_error": true,
    "order": {
        "_key": "asc"
    }
  },
    "aggs": {
      "agg_by_companyType" : {
          "top_hits": {
            "size": 1, 
            "_source": {
                "includes": ["companyType"]
            }
          }
        }
    }
}

But this just gives me first groupBy of company id now on top of that data I want count of company type. this is the response I get

"agg_by_companyID_topHits" : {
  "doc_count_error_upper_bound" : 0,
  "sum_other_doc_count" : 0,
  "buckets" : [
    {
      "key" : "amz",
      "doc_count" : 2,
      "doc_count_error_upper_bound" : 0,
      "agg_by_companytype" : {
        "hits" : {
          "total" : {
            "value" : 2,
            "relation" : "eq"
          },
          "max_score" : 0.0,
          "hits" : [
            {
              "_index" : "my-index",
              "_type" : "_doc",
              "_id" : "uytuygjuhg",
              "_score" : 0.0,
              "_source" : {
                "companyType" : "ret"
              }
            }
          ]
        }
      }
    },
    {
      "key" : "mic",
      "doc_count" : 1,
      "doc_count_error_upper_bound" : 0,
      "agg_by_companytype" : {
        "hits" : {
          "total" : {
            "value" : 1,
            "relation" : "eq"
          },
          "max_score" : 0.0,
          "hits" : [
            {
              "_index" : "my-index",
              "_type" : "_doc",
              "_id" : "uytuygjuhg",
              "_score" : 0.0,
              "_source" : {
                "companyType" : "cse"
              }
            }
          ]
        }
      }
    },
    {
      "key" : "ggl",
      "doc_count" : 2,
      "doc_count_error_upper_bound" : 0,
      "agg_by_companytype" : {
        "hits" : {
          "total" : {
            "value" : 2,
            "relation" : "eq"
          },
          "max_score" : 0.0,
          "hits" : [
            {
              "_index" : "my-index",
              "_type" : "_doc",
              "_id" : "uytuygjuhg",
              "_score" : 0.0,
              "_source" : {
                "companyType" : "ret"
              }
            }
          ]
        }
      }
    },
  ]
}

If it were spark, it would be simple to partition by companyID, group it and then group by companyType and count to get the desired result but not sure how to do it in ES.

Important Note: I am working with Opensearch. Possible solution for this in elastic search multi-terms-aggregation is not available in versions before v7.12.

So wondering how it was done before this feature in ES.

We came across this issue because AWS migrated from ES to Opensearch.

CodePudding user response:

use multi_terms agg doc here

GET /products/_search
{
  "aggs": {
    "genres_and_products": {
      "multi_terms": {
        "terms": [{
          "field": "companyID" 
        }, {
          "field": "companyType"
        }]
      }
    }
  }
}

CodePudding user response:

can you use script in terms agg ,like this:

GET b1/_search
{
  "aggs": {
    "genres": {
      "terms": {
        "script": {
          "source": "doc['companyID'].value doc['companyType'].value",
          "lang": "painless"
        }
      }
    }
  }
}
  • Related