Home > Net >  how to return all documents that has a subset of input array in OpenSearch
how to return all documents that has a subset of input array in OpenSearch

Time:04-26

I have a document with given structure:

{ "name" : "WF1", "myIndex" : [3, 4, 5] }

Lets say I have 4 of such records -

{ "name" : "WF1", "myIndex" : [3, 4, 5] }
{ "name" : "WF2", "myIndex" : [6, 7, 8] }
{ "name" : "WF3", "myIndex" : [9, 10, 11] }
{ "name" : "WF4", "myIndex" : [3, 6, 9] }

If I fire below "term" query:

GET myIndex/_search
{
  "query": {
    "terms": {
      "qualsIndex": [
        3, 6, 9, 20
      ]
    }
  }
}

It returns all 4 records. Whereas I only want to return a record that has 3,6, 9 i.e. only WF4. Basically, I want a result document that has a subset of input passed.

Note: I can tweak my document structure to achieve this. is it possible in OpenSearch?

CodePudding user response:

TLDR;

To the best of my knowledge there is no solution in both ElasticSearch and OpenSearch. But I think you can hack you way through it, using number as words

The Hack

Indexing the document with the field myIndex as a string of numbers. I can later search for those numbers, using the match query and the parameter such as minimum_should_match.

DELETE 72004393

POST _bulk
{"index":{"_index":"72004393"}}
{"name":"WF1","myIndex":"3 4 5"}
{"index":{"_index":"72004393"}}
{"name":"WF2","myIndex":"6 7 8"}
{"index":{"_index":"72004393"}}
{"name":"WF3","myIndex":"9 10 11"}
{"index":{"_index":"72004393"}}
{"name":"WF4","myIndex":"3 6 9"}


GET /72004393/_search
{
  "query": {
    "match": {
      "myIndex": {
        "query": "3 6 9 20",
        "minimum_should_match": 3
        }
    }
  }
}

Will give you something like that:

{
  ...
  "hits" : {
    ...
    "max_score" : 2.0794413,
    "hits" : [
      {
        "_index" : "72004393",
        "_id" : "xaMuYoABOgujegeQJgZr",
        "_score" : 2.0794413,
        "_source" : {
          "name" : "WF4",
          "myIndex" : "3 6 9"
        }
      }
    ]
  }
}

This is not perfect and may lead to some edge cases, but this is the closest I could get to a "solution".

CodePudding user response:

You can achive this with terms set query.

Example mapping:

{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "myIndex": {
        "type": "keyword"
      },
      "required_matches": {
        "type": "long"
      }
    }
  }
}

Example query:

{
  "query": {
    "terms_set": {
      "myIndex": {
        "terms": [3, 6, 9, 20],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

In your case, required_matches should be index as number of items of myIndex array

  • Related