Home > Blockchain >  Elasticsearch cardinality aggregation exception
Elasticsearch cardinality aggregation exception

Time:10-06

I am currently in the process of updating ES clusters from version 6 to 7 and, in version 7 a breaking change is introduced where missing document values will throw an error. My goal here is to alter this query and select all documents where those values exist and that should take care of my problem. How can I add a must not contain or must contain to this query to achieve my goal?

   {
       "query":{
          "bool":{
             "must":[
                {
                   "terms":{
                      "state":[
                         "pending",
                         "queued",
                         "deferred"
                      ]
                   }
                },
                {
                   "terms":{
                      "tenant_tag":[
                         "prod"
                      ]
                   }
                }
             ]
          }
       },
       "aggs":{
          "count":{
             "cardinality":{
                "script":"doc['user_id'].value   '_'   doc['campaign_id'].value"
             }
          }
       }
    }

CodePudding user response:

I would rewrite your query like this:

{
       "query":{
          "bool":{
             "filter":[
                {
                  "exists": { "field": "user_id" }
                },
                {
                  "exists": { "field": "campaign_id" }
                },
                {
                   "terms":{
                      "state":[
                         "pending",
                         "queued",
                         "deferred"
                      ]
                   }
                },
                {
                   "terms":{
                      "tenant_tag":[
                         "prod"
                      ]
                   }
                }
             ]
          }
       },
       "aggs":{
          "count":{
             "cardinality":{
                "script":"doc['user_id'].value   '_'   doc['campaign_id'].value"
             }
          }
       }
    }

Ideally, you should pre-compute the userid_campaignid field in your documents, so you don't have to use a scripted aggregation, which are terrible in terms of performance, especially since cardinality can already be terrible itself.

  • Related