I need a query where the results will exclude any userId
s if they have at least 1 document with the tag
set to a value within an 'excluded' list i.e. TAG A
or TAG B
.
I have an index with data like below:
{
"_index": "tags-3",
"_type": "_doc",
"_id": "YYYYYYY",
"_score": 10.272416,
"_source": {
"id": "YYYYYYY",
"userId": "User1",
"tag": "TAG A"
}
},
{
"_index": "tags-3",
"_type": "_doc",
"_id": "ZZZZZZ",
"_score": 10.272416,
"_source": {
"id": "ZZZZZZ",
"userId": "User1",
"tag": "TAG B"
},
{
"_index": "tags-3",
"_type": "_doc",
"_id": "ZZZZZZ",
"_score": 10.272416,
"_source": {
"id": "ZZZZZZ",
"userId": "User2",
"tag": "TAG A"
},
{
"_index": "tags-3",
"_type": "_doc",
"_id": "ZZZZZZ",
"_score": 10.272416,
"_source": {
"id": "ZZZZZZ",
"userId": "User2",
"tag": "TAG D"
},
{
"_index": "tags-3",
"_type": "_doc",
"_id": "ZZZZZZ",
"_score": 10.272416,
"_source": {
"id": "ZZZZZZ",
"userId": "User4",
"tag": "TAG D"
}
For the input above, I would expect an output of:
{
"_index": "tags-3",
"_type": "_doc",
"_id": "ZZZZZZ",
"_source": {
"userId": "User4"
}
since User4
has no documents with the tag
set to TAG A
or TAG B
.
User4
is the only other user with a document with the tag set to TAG D
however since it has another document with TAG B
, it is excluded.
CodePudding user response:
One way to do this would be to:
- Aggregate (group) on the user IDs - this would give you all the user IDs
- Then, aggregate the documents for each user ID (nested aggregation) with a filter for the multiple (or single) tag values you want to exclude - this would give you the total sum of documents with the tag set to an excluded tag for each user ID
- Finally, perform a bucket selector aggregation, only including user IDs which have a count of 0 for any excluded documents; this would give you the users who don't have any documents with any excluded tag values
This query should work, for an excluded tag list of A, B & C:
{
"aggs": {
"user-ids": {
"terms": {
"field": "userId.keyword"
},
"aggs": {
"excluded_tags_agg": {
"filter": {
"bool": {
"should": [
{
"match_phrase": {
"tag.keyword": "TAG A"
}
},
{
"match_phrase": {
"tag.keyword": "TAG B"
}
},
{
"match_phrase": {
"tag.keyword": "TAG C"
}
}
],
"minimum_should_match": 1
}
}
},
"filter_userids_which_do_not_have_any_docs_with_excluded_tags": {
"bucket_selector": {
"buckets_path": {
"doc_count": "excluded_tags_agg > _count"
},
"script": "params.doc_count == 0"
}
}
}
}
},
"size": 0
}