I now title is long and confusing. Sorry for that. I have a book database in Elasticsearch 8.2.3 and records like:
{id:"1", "writer": "andy", "genre":"horror", "book_name":"first book of andy"}
{id:"2", "writer": "andy", "genre":"fantasy", "book_name":"second book of andy"}
{id:"3", "writer": "andy", "genre":"romance", "book_name":"thirth book of andy"}
{id:"4", "writer": "mike", "genre":"fantasy", "book_name":"first book of mike"}
{id:"5", "writer": "mike", "genre":"horror", "book_name":"second book of mike"}
{id:"6", "writer": "mike", "genre":"horror", "book_name":"thirth book of mike"}
{id:"7", "writer": "greg", "genre":"romance", "book_name":"first book of greg"}
{id:"8", "writer": "greg", "genre":"horror", "book_name":"second book of greg"}
{id:"9", "writer": "ursa", "genre":"romance", "book_name":"first book of ursa"}
{id:"10", "writer": "ursa", "genre":"fantasy", "book_name":"second book of ursa"}
I want a query that would return authors who have at least one book in both the "horror" and "fantasy" genres.
{"writer": "andy"}
{"writer": "mike"}
Also it would be great if it returns also the total number of books in these genres by these authors.
{"writer": "andy", "book_count": 2}
{"writer": "mike", "book_count": 3}
I can do it with multiple query and some coding with python but it is slow. If there is a way to do it with elsticsearch query I want to use it.
CodePudding user response:
You can use the below query to get expected results
{
"size": 0,
"query": {
"bool": {
"should": [
{
"term": {
"genre.keyword": "horror"
}
},
{
"term": {
"genre.keyword": "fantasy"
}
}
]
}
},
"aggs": {
"distinct_writer": {
"terms": {
"field": "writer.keyword"
}
}
}
}
And this produce below result
"aggregations": {
"distinct_writer": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "mike",
"doc_count": 3
},
{
"key": "andy",
"doc_count": 2
}
]
}
}
CodePudding user response:
One way to do that is to divide your task in 2 elastic queries. Because that will help you to get book_count
as well.
From first query, get all the writers for horror
genre -
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"genre.keyword": "horror"
}
}
]
}
},
"aggs": {
"horror_genre_writers": {
"terms": {
"field": "writer.keyword"
}
}
}
}
Result of First query -
[
{"writer": "andy", "book_count": 1},
{"writer": "mike", "book_count": 2},
{"writer": "greg", "book_count": 1}
]
From second query, get all the writers for fantasy
genre -
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"genre.keyword": "fantasy"
}
}
]
}
},
"aggs": {
"fantasy_genre_writers": {
"terms": {
"field": "writer.keyword"
}
}
}
}
Result of Second query -
[
{"writer": "andy", "book_count": 1},
{"writer": "mike", "book_count": 1},
{"writer": "ursa", "book_count": 1}
]
Now take intersection of the results from both queries to get your desired output -
[
{"writer": "andy", "book_count": 2},
{"writer": "mike", "book_count": 3}
]