Home > Blockchain >  Can Elastic Search do aggregations for within a document?
Can Elastic Search do aggregations for within a document?

Time:07-12

I have a mapping like this:

mappings: {
    "seller": {
       "properties" : {
           "overallRating": {"type" : byte}
           "items": [
               {
                  itemName: {"type": string},
                  itemRating: {"type" : byte}
               }             
             ]
        }
    }
}

Each item will only have one itemRating. Each seller will only have one overall rating. There can be many items, and at most I'm expecting maybe 50 items with itemRatings. Not all items have to have an itemRating.

I'm trying to get an average rating for each seller that combines all itemRatings and the overallRating. I have looked into aggregations but all I have seen are aggregations for across all documents. The aggregation I'm looking to do is within the document itself, and I am not sure if that is possible. Any tips would be appreciated.

CodePudding user response:

Yes this is very much possible with Elasticeasrch. To produce a combined rating, you simply need to subaggregate by the document id. The only thing present in the bucket would be the individual document . That is what you want.

Here is an example:

Create the index:

PUT /ratings
{
  "mappings": {
    "properties": {
        "overallRating": {"type" : "float"},
        "items": {
          "type" : "nested",
          "properties": {
            "itemName" : {"type" : "keyword"},
            "itemRating" : {"type" : "float"},
            "overallRating": {"type" : "float"}
          }
        }
      }
  }
}

Add some data:

POST ratings/_doc/
{
  "overallRating" : 1,
  "items" : [
    {
      "itemName" : "labrador",
      "itemRating" : 10,
      "overallRating" : 1
    },
    {
      "itemName" : "saint bernard",
      "itemRating" : 20,
      "overallRating" : 1
    }
  ]
}

{
  "overallRating" : 1,
  "items" : [
    {
      "itemName" : "cat",
      "itemRating" : 5,
      "overallRating" : 1
    },
    {
      "itemName" : "rat",
      "itemRating" : 10,
      "overallRating" : 1
    }
  ]
}

Query the index for a combined rating and sort by the rating:

GET ratings/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "average_rating": {
      "composite": {
        "sources": [
          {
            "ids": {
              "terms": {
                "field": "_id"
              }
            }
          }
        ]
      },
      "aggs": {
        "average_rating": {
          "nested": {
            "path": "items"
          },
          "aggs": {
            "avg": {
              "avg": {
                "field": "items.compound"
              }
            }
          }
        }
      }
    }
  }, 
  "runtime_mappings": {
     "items.compound": {
       "type": "double",
       "script": {
         "source": "emit(doc['items.overallRating'].value   doc['items.itemRating'].value)"
       }
     }
   }
}

The result (Pls note that i changed the exact values of ratings between writing the answer and running it in the console, so the averages are a bit different)

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "average_rating" : {
      "after_key" : {
        "ids" : "3vUp44EBbR3hrRYkA8pj"
      },
      "buckets" : [
        {
          "key" : {
            "ids" : "3_Up44EBbR3hrRYkLsrC"
          },
          "doc_count" : 1,
          "average_rating" : {
            "doc_count" : 2,
            "avg" : {
              "value" : 151.0
            }
          }
        },
        {
          "key" : {
            "ids" : "3vUp44EBbR3hrRYkA8pj"
          },
          "doc_count" : 1,
          "average_rating" : {
            "doc_count" : 2,
            "avg" : {
              "value" : 8.5
            }
          }
        }
      ]
    }
  }
}

One change for convenience:

I edited your mappings to add the overAllRating to each Item entry. This simplifies the calculations that come subsequently, simply because you only look in the nested scope and never have to step out.

I also had to use a "runtime mapping" to combine the value of each overAllRating and ItemRating, to produce a better average. I basically made a sum of every ItemRating with the OverAllRating and averaged those across every entry.

I had to use a top level composite "id" aggregation so that we only get results per document (which is what you want).

There is some pretty heavy lifting happening here, but it is very possible and easy to edit this as you require.

HTH.

  • Related