Home > Software engineering >  Elasticsearch filter for a field based on several terms matching on all of them
Elasticsearch filter for a field based on several terms matching on all of them

Time:09-30

I want to filter a list of employees based on programming language skills like C, C , Java etc. I am using Elasticsearch DSL in Java to search based on all of the terms.

termsQuery returns data matching any of the terms : means if at least one terms matches, it selects the data

enter image description here

I tried to the following code to set minimum_should_match to tags.length to match all given tags as "AND operator" to filter data but failed.

QueryBuilder query = QueryBuilders
            .boolQuery()
            .must(
                    QueryBuilders
                            .termsQuery("tags",tags)
            )
            .minimumShouldMatch(tags.length);

I also tried to use TermsSetQueryBuilder to check list of terms but it throws exception : minimum_should_match_field not set

QueryBuilder query =
            QueryBuilders
                    .boolQuery()
                    .should(
                            new TermsSetQueryBuilder("tags", tags)
                    )
                    .minimumShouldMatch(tags.size());

Also, tried to set minimum_should_match_field in TermsSetQuery, but it only accepts String, not numeric value or percentage as mention here. Tried to set like minimum_should_match_field = "2" minimum_should_match_field = "100%" even tried to setMinimumShouldMatchScript. Not working.

QueryBuilder query =
            QueryBuilders
                    .boolQuery()
                    .should(
                            new TermsSetQueryBuilder("tags", tags)
                                    .setMinimumShouldMatchField(tags.size())
                    )
                    .minimumShouldMatch(tags.size());

How can I filter for a field("tags") based on several terms("tags": ["JAVA", "C"]) matching on all of them?

UPDATED My code looks like the following:

public List<Employee> getEmployeesByFilters(List<String> terms) {

    int required_matches = terms.size();
    QueryBuilder query =
            QueryBuilders
                    .boolQuery()
                    .must(
                            new TermsSetQueryBuilder(
                                    "filters", terms)
                                    .setMinimumShouldMatchField("required_matches")
                    );

    NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
            .withQuery(query)
            .build();

    List<Employee> employees = elasticsearchRestTemplate
    .search(nativeSearchQuery, Employee.class)
    .stream().map(SearchHit -> SearchHit.getContent())
    .collect(Collectors.toList());

    return employees;
}

CodePudding user response:

You can simpaly use boolean query to get expected result:

{   
    "query":{
        "bool" : {
            "must" : [
               {"term" : { "tags" : "JAVA" }},
               {"term" : { "tags" : "C" }}
             ]
          }
       }
    }
}

Terms Set query is not working because you need to add one field something like required_matches and set number for the number of matching terms required to return a document while indexing document.

So your index document will be looks like something below:

{
  "name": "Jane Smith",
  "tags": [ "C", "JAVA" ],
  "required_matches": 2
}

And you query will be looks like below:

{
  "query": {
    "terms_set": {
      "tags": {
        "terms": [ "JAVA", "C" ],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

I hope you will be able to create Java code from this query.

Updated:

QueryBuilder query = QueryBuilders.boolQuery()
                .must(new TermsSetQueryBuilder("tags", tags).setMinimumShouldMatchField("required_matches"));

Updated 2:

You can use minimum_should_match_script and provide the params.num_terms as script which will count number of terms you have given in query and match all the terms and return result.

Elasticsearch Query:

{
  "query": {
    "terms_set": {
      "tags": {
        "terms": [
          "JAVA",
          "C"
        ],
        "minimum_should_match_script": {
          "source": "params.num_terms"
        },
        "boost": 1
      }
    }
  }
}

Java code:

Map<String, Object> param = new HashMap<String, Object>();
        Script script = new Script(ScriptType.INLINE, "painless", "params.num_terms", param);

        QueryBuilder query = QueryBuilders.boolQuery()
                .must(new TermsSetQueryBuilder("tags", tags).setMinimumShouldMatchScript(script));

CodePudding user response:

For reference, my Java code (working version) looks like the following:

// Filtering data by a field("tags") based on several terms - matching on all of the terms
public List<Employee> getEmployeesByTags(List<String> tags) {
    // tags : string list of terms

    Map<String, Object> param = new HashMap<String, Object>();
    param.put("num_terms", tags.size());
    Script script = new Script(ScriptType.INLINE, "painless", "params.num_terms", param);

    QueryBuilder query =
            QueryBuilders
                    .boolQuery()
                    .must(
                            new TermsSetQueryBuilder(
                                    "tags", tags)
                                    .setMinimumShouldMatchScript(script)
                    );

    NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
            .withQuery(query)
            .build();

    List<Employee> employees = elasticsearchRestTemplate
    .search(nativeSearchQuery, Employee.class)
    .stream().map(SearchHit -> SearchHit.getContent())
    .collect(Collectors.toList());

    return employees;
}
  • Related