Home > Enterprise >  Elasticsearch - not getting expected aggregated count
Elasticsearch - not getting expected aggregated count

Time:09-10

In a scenario, I have to search for phone numbers that start with '40'. I need to get a matching phone number that starts with '40' and a count of the phone number that is matching.

Actually, I want to search in multiple fields, just, for example, I am searching only for phone numbers.

For that, I used the below query.

GET emp_details_1_1/_msearch 
{
   "index":"emp_details_1_1"
}{
   "_source":[
      
   ],
   "size":0,
   "min_score":1,
   "query":{
      "multi_match":{
         "query":"40",
         "fields":[
            "phone"
         ],
         "type":"phrase_prefix"
      }
   },
   "aggs":{
      "phone":{
         "terms":{
            "field":"phone.keyword",
            "include":"40.*"
         }
      },
      "phone_count":{
         "value_count":{
            "field":"phone.keyword"
         }
      }
   }
}

I am using Value Count aggregation for field-wise total count.

In the output, I can see the phone number data starting with '40', ie one single record. Example '40x-xxx-xxxx' But When I see the count, the matching count is '4'. Because while aggregating the query is considering a phone number that starts with '40' and also a phone number that has '40' in between after the dash '-'. Example: 'xxx-40x-xxxx','xxx-xxx-40x','xxx-xxx-40x'. While getting the aggregate count, I want to omit the phone numbers that have '40' in between.

Below is the output, I am getting.

{
  "took" : 70,
  "responses" : [
    {
      "took" : 70,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 4,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "phone_count" : {
          "value" : 4
        },
        "phone" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "4034487929",
              "doc_count" : 1
            }
          ]
        }
      },
      "status" : 200
    }
  ]
}

I tried various options, but not getting the expected results.

CodePudding user response:

Instead of match_prefix query(which can be done only on text fields(analyzed)), you need to use the prefix query(done on keyword fields), good that you are already having the keyword field for your phone field, hence changing your query to below query will provide your correct results.

{
    "size": 0,
    "min_score": 1,
    "query": {
        "prefix": {
            "phone.keyword": "40"
        }
    },
    "aggs": {
        "phone": {
            "terms": {
                "field": "phone.keyword",
                 "include":"40.*"
            }
        },
        "phone_count": {
            "value_count": {
                "field": "phone.keyword"
            }
        }
    }
}

And result

 "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "phone_count": {
            "value": 1
        },
        "phone": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "4034487929",
                    "doc_count": 1
                }
            ]
        }
    }
  • Related