I am completely new to AWS ElastiSearch and am trying to do something on a dataset about tagged movies. The dataset has five columns : genres, movieId, tag, title, userId
. The year of each movie is contained in the title like so Waterworld (1995)
.
I want to see how many movies with the tag true story
were produced in 2002.
Since I first have to match the date, then filter with the tag and finally count the movies I tried doing it with a bool like so:
GET tagged_movies/_search
{
"query": {
"bool": {
"must": [
{
"regexp": {
"title": "(2002)"
}
}
],
"filter": [
{
"term": {
"tag": "true story"
}
}
],
"aggs": {
"by_numberofmovies": {
"terms": {
"field": "movieId"
}
}
}
}
}
}
But I get the following error :
{
"error" : {
"root_cause" : [
{
"type" : "x_content_parse_exception",
"reason" : "[18:7] [bool] unknown field [aggs]"
}
],
"type" : "x_content_parse_exception",
"reason" : "[18:7] [bool] unknown field [aggs]"
},
"status" : 400
}
which I don't understand at all since the bool should recognize aggs
. I've tried looking in the documentation as well as on the internet but it says that bool should indeed recognize the aggs
. Could someone guide to where the problem might be ?
Here is an example of a the sample document that this query should match:
{
"_index" : "tagged_movies",
"_id" : "EgADsX8B2WnPqWZmot9b",
"_score" : 1.0,
"_source" : {
"@timestamp" : "2011-03-22T04:22:48.000 01:00",
"genres" : "Comedy",
"movieId" : 5283,
"tag" : "true story",
"title" : "National Lampoon's Van Wilder (2002)",
"userId" : 121,
"timestamp" : "2011-03-22 04:22:48"
}
CodePudding user response:
aggs
can't be inside the query block, aggs
and query
are siblings, you correct query should be like below
{
"query": {
"bool": {
"must": [
{
"regexp": {
"title": "(2002)"
}
}
],
"filter": [
{
"match": {
"tag": "true story"
}
}
]
}
},
"aggs": {
"by_numberofmovies": {
"terms": {
"field": "movieId"
}
}
}
}