Home > front end >  Terms Set Query's minimum_should_match_field does not behave as expected when the provided fiel
Terms Set Query's minimum_should_match_field does not behave as expected when the provided fiel

Time:12-28

I am wondering, using "terms set" query, why when a field that specified by the minimum_should_match_field has value "0", it behaves as if it has value "1".

To replicate the problem, I take the example from the Elasticsearch doc and construct three steps below.

Step 1:

Create a new index

PUT /job-candidates
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "programming_languages": {
        "type": "keyword"
      },
      "required_matches": {
        "type": "long"
      }
    }
  }
}

Step 2:

Create two docs with required_matches set to zero

PUT /job-candidates/_doc/1?refresh
{
  "name": "Jane",
  "programming_languages": [ "c  ", "java" ],
  "required_matches": 0
}

and also

PUT /job-candidates/_doc/1?refresh
{
  "name": "Ben",
  "programming_languages": [ "python" ],
  "required_matches": 0
}

Step 3:

Search for docs with the following search

GET /job-candidates/_search
{
  "query": {
    "terms_set": {
      "programming_languages": {
        "terms": [ "c  ", "java"],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

Expected Results: I expect step 3 returns both docs "Jane" and "Ben"

Actual Results: but it only returns doc "Jane"

I don't understand. If minimum_should_match is 0, doesn't it mean that an returned doc do not need to match any term(s), therefore "Ben" doc should also be returned?

Some links I found but still can't answer my question:

  1. minimum_should_match
  • It looks like minimum_should_match can't not be zero, but it does not says how search works if it's indeed zero or more than the number of optional values.
  1. A discussion of default value for minimum_should_match
  • But they didn't discuss the "terms set" query in particular.

Any clarification will be appreciated! Thanks.

CodePudding user response:

When looking at the terms_set source code, we can see that the underlying Lucene query being used is called CoveringQuery.

So the explanation can be found in Lucene's source code of CoveringQuery, whose documentation says

Per-document long value that records how many queries should match. Values that are less than 1 are treated like 1: only documents that have at least one matching clause will be considered matches. Documents that do not have a value for minimumNumberMatch do not match.

And a little further, the code that sets minimumNumberMatch is pretty self-explanatory:

final long minimumNumberMatch = Math.max(1, minMatchValues.longValue());

We can simply sum it up by stating that it doesn't really make sense to send a terms_set query with minimum_should_match: 0 as it would be equivalent to a match_all query.

  • Related