Home > Software design >  Unique conditional count
Unique conditional count

Time:10-07

I'm trying to get a unique count with an inclusion and exclusion condition. Imagine I want to get counts per groupId when place is equals to london but not paris. The following is an example of different documents in the same index.

[
  {
    "groupId": 123,
    "place": "london"
  },
  {
    "groupId": 123,
    "place": "berlin"
  },
  {
    "groupId": 456,
    "place": "london"
  },
  {
    "groupId": 789,
    "place": "london"
  },
  {
    "groupId": 789,
    "place": "paris"
  },
  {
    "groupId": 789,
    "place": "berlin"
  },
  {
    "groupId": ABC,
    "place": "tokyo"
  }
]

Output should be similar to:

[
  {
    "groupId": 123,
    "count": "1"
  },
  {
    "groupId": 456,
    "count": "1"
  }
]

"groupId": 789 is not included because there is one with place is paris, and "groupId": "ABC" is not included because it doesn't have any with london

CodePudding user response:

I have used following aggregations

1. Terms aggregation

2. filter aggregation

3. Bucket selector

In query section, I have filtered documents which have place as either London or Paris. This is to increase performance by removing documents which don't have either of these ex."groupId": "ABC"

In aggregation section, I have done a group by on groupId and then calculated count of London and Paris under it using filter aggregation. Using bucket selector I have only kept those group ids where count of London is at least 1 and count of Paris is zero.

Query

{
    "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "place": [
              "london",
              "paris"
            ]
          }
        }
      ]
    }
  },
  "size": 0,
  "aggs": {
    "groups": {
      "terms": {
        "field": "groupId",
        "size": 10
      },
      "aggs": {
        "london": {
          "filter": {
            "term": {
              "place": "london"
            }
          }
        },
        "paris": {
          "filter": {
            "term": {
              "place": "paris"
            }
          }
        },
        "buvket": {
          "bucket_selector": {
            "buckets_path": {
              "paris_count": "paris>_count",
              "london_count": "london>_count"
            },
            "script": "params.paris_count==0 && params.london_count>=1"
          }
        }
      }
    }
  }
}

Result

 "aggregations" : {
    "groups" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 123,
          "doc_count" : 1,
          "paris" : {
            "doc_count" : 0
          },
          "london" : {
            "doc_count" : 1
          }
        },
        {
          "key" : 456,
          "doc_count" : 1,
          "paris" : {
            "doc_count" : 0
          },
          "london" : {
            "doc_count" : 1
          }
        }
      ]
    }
  }
  • Related