I am wondering, using "terms set" query, why when a field that specified by the minimum_should_match_field has value "0", it behaves as if it has value "1".
To replicate the problem, I take the example from the Elasticsearch doc and construct three steps below.
Step 1:
Create a new index
PUT /job-candidates
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"programming_languages": {
"type": "keyword"
},
"required_matches": {
"type": "long"
}
}
}
}
Step 2:
Create two docs with required_matches set to zero
PUT /job-candidates/_doc/1?refresh
{
"name": "Jane",
"programming_languages": [ "c ", "java" ],
"required_matches": 0
}
and also
PUT /job-candidates/_doc/1?refresh
{
"name": "Ben",
"programming_languages": [ "python" ],
"required_matches": 0
}
Step 3:
Search for docs with the following search
GET /job-candidates/_search
{
"query": {
"terms_set": {
"programming_languages": {
"terms": [ "c ", "java"],
"minimum_should_match_field": "required_matches"
}
}
}
}
Expected Results: I expect step 3 returns both docs "Jane" and "Ben"
Actual Results: but it only returns doc "Jane"
I don't understand. If minimum_should_match is 0, doesn't it mean that an returned doc do not need to match any term(s), therefore "Ben" doc should also be returned?
Some links I found but still can't answer my question:
- It looks like minimum_should_match can't not be zero, but it does not says how search works if it's indeed zero or more than the number of optional values.
- A discussion of default value for minimum_should_match
- But they didn't discuss the "terms set" query in particular.
Any clarification will be appreciated! Thanks.
CodePudding user response:
When looking at the terms_set
source code, we can see that the underlying Lucene query being used is called CoveringQuery
.
So the explanation can be found in Lucene's source code of CoveringQuery
, whose documentation says
Per-document long value that records how many queries should match. Values that are less than 1 are treated like
1
: only documents that have at least one matching clause will be considered matches. Documents that do not have a value forminimumNumberMatch
do not match.
And a little further, the code that sets minimumNumberMatch
is pretty self-explanatory:
final long minimumNumberMatch = Math.max(1, minMatchValues.longValue());
We can simply sum it up by stating that it doesn't really make sense to send a terms_set
query with minimum_should_match: 0
as it would be equivalent to a match_all
query.