Home > Back-end >  Composite aggregation query with bucket_sort does not work properly
Composite aggregation query with bucket_sort does not work properly

Time:08-02

I have an index to store financial transactions:

{
  "mappings": {
    "_doc": {
      "properties": {
        "amount": {
          "type": "long"
        },
        "currencyCode": {
          "type": "keyword"
        },
        "merchantId": {
          "type": "keyword"
        },
        "merchantName": {
          "type": "text"
        },
        "partnerId": {
          "type": "keyword"
        },
        "transactionDate": {
          "type": "date"
        },
        "userId": {
          "type": "keyword"
        }
      }
    }
  }
}

Here's my query:

GET /transactions/_search
{
  "aggs": {
    "date_merchant": {
      "aggs": {
        "amount": {
          "sum": {
            "field": "amount"
          }
        },
        "amount_sort": {
          "bucket_sort": {
            "sort": [
              {
                "amount": {
                  "order": "desc"
                }
              }
            ]
          }
        },
        "top_hit": {
          "top_hits": {
            "_source": {
              "includes": [
                "merchantName",
                "currencyCode"
              ]
            },
            "size": 1
          }
        }
      },
      "composite": {
        "size": 1,
        "sources": [
          {
            "date": {
              "date_histogram": {
                "calendar_interval": "day",
                "field": "transactionDate"
              }
            }
          },
          {
            "merchant": {
              "terms": {
                "field": "merchantId"
              }
            }
          }
        ]
      }
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "userId": "AAA"
          }
        },
        {
          "term": {
            "partnerId": "BBB"
          }
        },
        {
          "range": {
            "transactionDate": {
              "gte": "2022-07-01"
            }
          }
        },
        {
          "term": {
            "currencyCode": "EUR"
          }
        }
      ]
    }
  },
  "size": 0
}

Please note the "size": 1 in the composite aggregation.

If I change it to 3 (based on my data)... I get different results!

That means the bucket_sort operation doesn't work on the whole list of buckets, but just on the returned ones (if it's just one, that means it's not going to be sorted at all!)

How can I sort on ALL the buckets instead?

CodePudding user response:

Composite agg design

The composite aggregation is designed to iterate all buckets in the most efficient way possible.

How can I sort on ALL the buckets instead?

To fully sort over ALL buckets, all buckets would have to be enumerated ahead of time, defeating the design of the composite aggregation.

So, how to actually sort over all buckets?

Then aggregate over all buckets in a single call. Set your size to the largest number of buckets available within your query.

The number of buckets will be the cardinality of merchantId and the number of days in the date histogram.

Another option is to paginate over all the composite buckets and then sort them client side. If you choose this path, it may be good to have each page of the composite aggregation be sorted so that sorting them client side will be faster.

CodePudding user response:

I ended up following Benjamin's advice and my query now looks like a traditional aggregation.

enter image description here

Only problem: pagination server-side doesn't work for me (no after_key whatsoever). Any luck with that @Benjamin?

  • Related