Home > Software engineering >  elasticearch aggregation by array size
elasticearch aggregation by array size

Time:03-05

I need a stats on elasticsearch. I can't make the request.

I would like to know the number of people per appointment.

appointment index mapping

{
  "id" : "383577",
  "persons" : [
    {
      "id" : "1",
    },
    {
      "id" : "2",
    }
  ]
}

what i would like

"buckets" : [
{
  "key" : "1", <--- appointment of 1 person
  "doc_count" : 1241891
},
{
  "key" : "2", <--- appointment of 2 persons
  "doc_count" : 10137
},
{
  "key" : "3", <--- appointment of 3 persons
  "doc_count" : 8064
}

]

Thank you

CodePudding user response:

The easiest way to do this is to create another integer field containing the length of the persons array and aggregating on that field.

{
  "id" : "383577",
  "personsCount": 2,            <---- add this field
  "persons" : [
    {
      "id" : "1",
    },
    {
      "id" : "2",
    }
  ]
}

The non-optimal way of achieving what you expect is to use a script that will return the length of the persons array dynamically, but be aware that this is sub-optimal and can potentially harm your cluster depending on the volume of data you have:

GET /_search
{
  "aggs": {
    "persons": {
      "terms": {
        "script": "doc['persons.id'].size()"
      }
    }
  }
}

If you want to update all your documents to create that field you can do it like this:

POST index/_update_by_query
{
  "script": {
    "source": "ctx._source.personsCount = ctx._source.persons.length"
  }
}

However, you'll also need to modify the logic of your indexing application to create that new field.

  • Related