Home > Blockchain >  How can you do a sum aggregation inside a "group by" aggregation in Elasticsearch?
How can you do a sum aggregation inside a "group by" aggregation in Elasticsearch?

Time:08-19

Hello :) We need a bit of help with an ElasticSearch query

So we have the following mappings:

 "document" : {
     "properties" : {
      "client" : {
       "type" : "keyword",
       "fields" : {
        "client_search" : {
         "type" : "text",
         "analyzer" : "eflow_nGram_analyzer"
        }
       },
       "normalizer" : "ci_normalizer"
      },
      "postings" : {
       "type" : "nested",
       "include_in_parent" : true,
       "properties" : {
        "amount" : {
         "type" : "double",
         "fields" : {
          "amount_search" : {
           "type" : "text",
           "analyzer" : "eflow_nGram_analyzer"
          }
         }
        },
        "product" : {
         "properties" : {
          "client" : {
           "type" : "keyword",
           "normalizer" : "ci_normalizer"
          },
          "cost" : {
           "type" : "double",
           "fields" : {
            "cost_search" : {
             "type" : "text",
             "analyzer" : "eflow_nGram_analyzer"
            }
           }
          },
          "description" : {
           "type" : "text"
          },
          "rno" : {
           "type" : "keyword",
           "normalizer" : "ci_normalizer"
          },
         }
        },
        "quantity" : {
         "type" : "double",
         "fields" : {
          "quantity_search" : {
           "type" : "text",
           "analyzer" : "eflow_nGram_analyzer"
          }
         }
        },

Document -> nested postings -> product, amount, quantity

The product has an id (rno), cost description.

What I want to do is group the documents by product ids and add the quantity.

For example if I have 2 documents, doc A and doc B

Doc A has 2 postings:

  1. posting 1 with quantity 1 and a product with rno X
  2. posting 2 with quantity 4 and a product with rno Y

Doc B has 2 postings:

  1. posting 1 with quantity 1 and a product with rno X
  2. posting 2 with quantity 3 and a product with rno Z

I want to group by the product.rno and only sum the quantities inside the postings of the grouped by product rno.

So I want:

  1. Group with product rno X and total quantity of 1 1 = 2
  2. Group with product rno Y and total quantity of 4
  3. Group with product rno Z and total quantity of 3

I have the following aggregations:

 "aggs": {
   "group_by_product_id": {
     "terms": {
       "field": "document.postings.product.rno"
     },
     "aggs": {
       "product_quantity_total": {
         "sum": {
           "field": "document.postings.quantity"
         }
       }
     }
   }
 }

But the total quantity is computed wrong as in it adds all the quantities from the postings and not from group by rno

So for the example above I would obtain:

  1. Group with product rno X and total quantity of 1 4 1 3 = 9
  2. Group with product rno Y and total quantity of 4 1 = 5
  3. Group with product rno Z and total quantity of 3 1 = 4

Do you have any idea how I could use elasticsearch to group by a field inside a nested structure (document.postings.product.rno), then add a sum aggregation to only sum on the fields (document.postings.quantity) of the matched items inside the grouped by?

CodePudding user response:

This is similar to this

And the aggregation structure should be something like this:

  "aggs": {
    "the_postings": {
      "nested": {
        "path": "document.postings"
      },
      "aggs": {
        "group_by_product_id": {
          "terms": {
            "field": "document.postings.product.rno"
          },
          "aggs": {
            "total_quantity": {
              "sum": {
                "field": "document.postings.quantity"
              }
            }
          }
        }
      }
    }
  }

So you need to go through one more layer, the nested postings, in order to get the right summarization of the quantity field which is on the same level as the product field (in which you 'group by' the product id)

  • Related