Home > Back-end >  How can I filter bucket aggregation results based on sub-bucket aggregation document count?
How can I filter bucket aggregation results based on sub-bucket aggregation document count?

Time:08-14

I need a query where the results will exclude any userIds if they have at least 1 document with the tag set to a value within an 'excluded' list i.e. TAG A or TAG B.

I have an index with data like below:

{
  "_index": "tags-3",
  "_type": "_doc",
  "_id": "YYYYYYY",
  "_score": 10.272416,
  "_source": {
    "id": "YYYYYYY",
    "userId": "User1",
    "tag": "TAG A"
  }
},
{
  "_index": "tags-3",
  "_type": "_doc",
  "_id": "ZZZZZZ",
  "_score": 10.272416,
  "_source": {
    "id": "ZZZZZZ",
    "userId": "User1",
    "tag": "TAG B"
  },
{
  "_index": "tags-3",
  "_type": "_doc",
  "_id": "ZZZZZZ",
  "_score": 10.272416,
  "_source": {
    "id": "ZZZZZZ",
    "userId": "User2",
    "tag": "TAG A"
},
{
  "_index": "tags-3",
  "_type": "_doc",
  "_id": "ZZZZZZ",
  "_score": 10.272416,
  "_source": {
    "id": "ZZZZZZ",
    "userId": "User2",
    "tag": "TAG D"
},
{
  "_index": "tags-3",
  "_type": "_doc",
  "_id": "ZZZZZZ",
  "_score": 10.272416,
  "_source": {
    "id": "ZZZZZZ",
    "userId": "User4",
    "tag": "TAG D"
}

For the input above, I would expect an output of:

{
  "_index": "tags-3",
  "_type": "_doc",
  "_id": "ZZZZZZ",
  "_source": {
    "userId": "User4"
  }

since User4 has no documents with the tag set to TAG A or TAG B.

User4 is the only other user with a document with the tag set to TAG D however since it has another document with TAG B, it is excluded.

CodePudding user response:

One way to do this would be to:

  1. Aggregate (group) on the user IDs - this would give you all the user IDs
  2. Then, aggregate the documents for each user ID (nested aggregation) with a filter for the multiple (or single) tag values you want to exclude - this would give you the total sum of documents with the tag set to an excluded tag for each user ID
  3. Finally, perform a bucket selector aggregation, only including user IDs which have a count of 0 for any excluded documents; this would give you the users who don't have any documents with any excluded tag values

This query should work, for an excluded tag list of A, B & C:

{
  "aggs": {
    "user-ids": {
      "terms": {
        "field": "userId.keyword"
      },
      "aggs": {
        "excluded_tags_agg": {
          "filter": {
            "bool": {
              "should": [
                {
                  "match_phrase": {
                    "tag.keyword": "TAG A"
                  }
                },
                {
                  "match_phrase": {
                    "tag.keyword": "TAG B"
                  }
                },
                {
                  "match_phrase": {
                    "tag.keyword": "TAG C"
                  }
                }
              ],
              "minimum_should_match": 1
            }
          }
        },
        "filter_userids_which_do_not_have_any_docs_with_excluded_tags": {
          "bucket_selector": {
            "buckets_path": {
              "doc_count": "excluded_tags_agg > _count"
            },
            "script": "params.doc_count == 0"
          }
        }
      }
    }
  },
  "size": 0
}
  • Related