Home > Software engineering >  Elasticsearch - Count number of occurrence perd field per document
Elasticsearch - Count number of occurrence perd field per document

Time:01-24

Is it possible to calculate the number of occurence of distinct values in a list field.

For example, let the following data:

[
    {
      "page":1,
      "colors":[
        {
         "color": red
        },
        {
         "color": white
         },
         {
          "color": red
         }
        ]
    },
    {
      "page":2,
      "colors":[
        {
         "color": yellow
        },
         {
          "color": yellow
         }
        ]
    }
  ]

Is it possible to get a result as the follwing:

{
      "page":1,
      "colors_count":[
        {
         "Key": red,
          "Count": 2
        },
        {
         "Key": white,
          "Count": 1
        },
        ]
    },
    {
      "page":2,
      "colors_count":[
        {
         "Key": yellow,
          "Count": 2
        }
        ]
    }

I tried using term aggregation but I got the number of distinct values, so for page:1 i got red:1 and white:1.

CodePudding user response:

Yes, you can do it. you will have to use nested_field type and nested_Agg

Mapping:

PUT colors
{
  "mappings": {
    "properties": {
      "page" : { "type": "keyword" },
      "colors": { 
        "type": "nested",
        "properties": {
          "color": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

Insert Documents:

PUT colors/_doc/1
{
  "page": 1,
  "colors": [
    {
      "color": "red"
    },
    {
      "color": "white"
    },
    {
      "color": "red"
    }
  ]
}

PUT colors/_doc/2
{
  "page": 2,
  "colors": [
    {
      "color": "yellow"
    },
    {
      "color": "yellow"
    }
  ]
}

Query:

GET colors/_search 
{
  "size" :0,
  "aggs": {
    "groupByPage": {
      "terms": {
        "field": "page"
      },
      "aggs": {
        "colors": {
          "nested": {
            "path": "colors"
          },
          "aggs": {
            "genres": {
              "terms": {
                "field": "colors.color"
              }
            }
          }
        }
      }
    }
  }
}

Output:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "groupByPage": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "1", // page field value
          "doc_count": 1,
          "colors": {
            "doc_count": 3,
            "genres": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "red",
                  "doc_count": 2
                },
                {
                  "key": "white",
                  "doc_count": 1
                }
              ]
            }
          }
        },
        {
          "key": "2", // page field value
          "doc_count": 1,
          "colors": {
            "doc_count": 2,
            "genres": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                {
                  "key": "yellow",
                  "doc_count": 2
                }
              ]
            }
          }
        }
      ]
    }
  }
}
  • Related