Home > database >  Elastisearch : bool with regexp, filter and aggs
Elastisearch : bool with regexp, filter and aggs

Time:03-29

I am completely new to AWS ElastiSearch and am trying to do something on a dataset about tagged movies. The dataset has five columns : genres, movieId, tag, title, userId. The year of each movie is contained in the title like so Waterworld (1995). I want to see how many movies with the tag true story were produced in 2002. Since I first have to match the date, then filter with the tag and finally count the movies I tried doing it with a bool like so:

GET tagged_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "regexp": {
            "title": "(2002)"
          }
        }
      ],
      "filter": [
        {
          "term": {
            "tag": "true story"
          }
        }
      ],
      "aggs": {
        "by_numberofmovies": {
          "terms": {
            "field": "movieId"
          }
        }
      }
    }
  }
}

But I get the following error :

{
  "error" : {
    "root_cause" : [
      {
        "type" : "x_content_parse_exception",
        "reason" : "[18:7] [bool] unknown field [aggs]"
      }
    ],
    "type" : "x_content_parse_exception",
    "reason" : "[18:7] [bool] unknown field [aggs]"
  },
  "status" : 400
}

which I don't understand at all since the bool should recognize aggs. I've tried looking in the documentation as well as on the internet but it says that bool should indeed recognize the aggs. Could someone guide to where the problem might be ?

Here is an example of a the sample document that this query should match:

{
        "_index" : "tagged_movies",
        "_id" : "EgADsX8B2WnPqWZmot9b",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2011-03-22T04:22:48.000 01:00",
          "genres" : "Comedy",
          "movieId" : 5283,
          "tag" : "true story",
          "title" : "National Lampoon's Van Wilder (2002)",
          "userId" : 121,
          "timestamp" : "2011-03-22 04:22:48"
        }

CodePudding user response:

aggs can't be inside the query block, aggs and query are siblings, you correct query should be like below

{
    "query": {
        "bool": {
            "must": [
                {
                    "regexp": {
                        "title": "(2002)"
                    }
                }
            ],
            "filter": [
                {
                    "match": {
                        "tag": "true story"
                    }
                }
            ]
        }
    },
    "aggs": {
        "by_numberofmovies": {
            "terms": {
                "field": "movieId"
            }
        }
    }
}
  • Related