Home > database >  Count the number of elements in a list field in Elastic Search
Count the number of elements in a list field in Elastic Search

Time:12-11

I'm still learning to use DSL queries in ElasticSearch. I have documents where one field is a list. I need to count the number of documents that have one element in this field, two elements in this field, etc. For example, here is a document structure:

Document1:

"Volume": [
{
"partition": "s1",
"fieldtype": ["A","B"]
}
]

Document 2:

"Volume": [
{
"partition": "s1",
"fieldtype": ["A"]
}
]

Document 3:

"Volume": [
{
"partition": "s1",
"fieldtype": ["B"]
}
]

I need a way to calculate that there is one document with 2 elements in fieldtype field and 2 documents with one element in fieldtype. If I try to aggregate them like this:

"size":0,
"aggs": {
"name": {
"terms": {
"field": "fieldtype.keyword"
}
}
}

I get counts of elements (number of As and Bs). Without using keyword, I get an error.

CodePudding user response:

I imagined that you work with nested type. Below is my solution:

PUT idx_test
{
  "mappings": {
    "properties": {
      "Volume": {
        "type": "nested"
      }
    }
  }
}

POST idx_test/_bulk
{"index":{ "_id": 1}}
{"Volume":[{"partition": "s1","fieldtype": ["A","B"]}]}
{"index":{ "_id": 2}}
{"Volume":[{"partition": "s1","fieldtype": ["A"]}]}
{"index":{ "_id": 3}}
{"Volume":[{"partition": "s1","fieldtype": ["B"]}]}

GET idx_test/_search
{
  "size": 0,
  "aggs": {
    "doc_id": {
      "terms": {
        "field": "_id",
        "size": 10
      },
      "aggs": {
        "volumes": {
          "nested": {
            "path": "Volume"
          },
          "aggs": {
            "size": {
              "sum": {
                "script": {
                  "lang": "painless",
                  "source": "doc['Volume.fieldtype.keyword'].size()"
                }
              }
            }
          }
        }
      }
    }
  }
}

Respons:

 "aggregations" : {
    "doc_id" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1",
          "doc_count" : 1,
          "volumes" : {
            "doc_count" : 1,
            "size" : {
              "value" : 2.0
            }
          }
        },
        {
          "key" : "2",
          "doc_count" : 1,
          "volumes" : {
            "doc_count" : 1,
            "size" : {
              "value" : 1.0
            }
          }
        },
        {
          "key" : "3",
          "doc_count" : 1,
          "volumes" : {
            "doc_count" : 1,
            "size" : {
              "value" : 1.0
            }
          }
        }
      ]
    }
  }

CodePudding user response:

@rabbitbr provided a good answer, but I could not understand why we tried to use a nested field. And, I think we need to use terms aggregation instead of sum here. Anyhow, here is a solution without nested :

PUT idx_test

POST idx_test/_bulk
{"index":{ "_id": 1}}
{"Volume":[{"partition": "s1","fieldtype": ["A","B"]}]}
{"index":{ "_id": 2}}
{"Volume":[{"partition": "s1","fieldtype": ["A"]}]}
{"index":{ "_id": 3}}
{"Volume":[{"partition": "s1","fieldtype": ["B"]}]}

GET idx_test/_mapping

GET idx_test/_search
{
  "size": 0,
  "aggs": {
    "size": {
      "terms": {
        "script": {
          "lang": "painless",
          "source": "doc['Volume.fieldtype.keyword'].size()"
        }
      }
    }
  }
}

Without using keyword, I get an error.

This is normal because without keyword you are trying to build an aggregation on a field whose type is text.

Here the response for the query above which is pretty basic query :

{
  ....
  "aggregations": {
    "size": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "1",
          "doc_count": 2
        },
        {
          "key": "2",
          "doc_count": 1
        }
      ]
    }
  }
}

As you can see, we have 2 documents with 1 sized array and we have 1 document with 2 sized array.

  • Related