Home > database >  Filtering Documents with nested field nested value Elastic Search
Filtering Documents with nested field nested value Elastic Search

Time:07-14

My Data demo:

{
  "id": "1",
  "username": "demo",
  "email": "dasdasdas@dsadas",
  "number": "000111000",
  "createdDate": "2022-07-13",
  "educations": [
    {
      "name": "test01",
      "score": "5.00",
      "config": {
        "configName": "Ha Ha",
        "isVisible": true
      }
    },
    {
      "name": "demo02",
      "score": "4.50",
      "config": {
        "configName": "Hi Hi",
        "isVisible": false
      }
    },
    {
      "name": "demo03",
      "score": "4.00",
      "config": {
        "configName": "Hu Hu",
        "isVisible": true
      }
    }
  ]
}

Now, I want to show all data where educatations.config.isVisible = true

my java code that I tried is following:

boolQueryBuilder = boolQueryBuilder.must(
                       nestedQuery("educations.config", 
                           termQuery("educations.config.isVisible", true),
                           ScoreMode.Total));

But It's returning all data.

Can anyone help me out with the query.

CodePudding user response:

Well, the demo data you provided consists of one single document, which inside of a nested array has the data you want to filter on.

By default, Elasticsearch will always return all complete documents that match a query. Since one of the nested fields matches your query, the complete source is returned. This is correct and intended behavior.

If you instead want partial hits of one document, there are several options, though none of them may yield exactly what you intend:

While you could use _source_includes and/or _source_excludes to get partial results from Elasticsearch, afaik you cannot do so conditionally. I.e. you could

GET test_index/_search?_source_excludes=educations

but this will remove all educations fields from the result, not based on a condition like your question entails.

A way to kind of get what you want is to use inner_hits in a nested query:

GET test_index/_search
{
  "query": {
    "nested": {
      "path": "educations.config",
      "query": {
        "match": {"educations.config.isVisible": true}
      },
      "inner_hits": {} 
    }
  }
}

Each hit will have an additional section inner_hits that contains only the (sub-)hits for the nested fields that meet the search condition. By default the _source of the returned hits inside inner_hits is relative to the _nested metadata. So in the above example only the config part is returned per nested hit and not the entire source of the top level document:

"_source" : {
  "id" : "1",
  "username" : "demo",
  "email" : "dasdasdas@dsadas",
  "number" : "000111000",
  "createdDate" : "2022-07-13",
  "educations" : [
    {
      "name" : "test01",
      "score" : "5.00",
      "config" : {
        "configName" : "Ha Ha",
        "isVisible" : true
      }
    },
    {
      "name" : "demo02",
      "score" : "4.50",
      "config" : {
        "configName" : "Hi Hi",
        "isVisible" : false
      }
    },
    {
      "name" : "demo03",
      "score" : "4.00",
      "config" : {
        "configName" : "Hu Hu",
        "isVisible" : true
      }
    }
  ]
},
"inner_hits" : {
  "educations.config" : {
    "hits" : {
      "total" : {
        "value" : 2,
        "relation" : "eq"
      },
      "max_score" : 0.4700036,
      "hits" : [
        {
          "_index" : "test_index",
          "_type" : "_doc",
          "_id" : "dQSz94EBbpRrdmjWWpzK",
          "_nested" : {
            "field" : "educations.config",
            "offset" : 0
          },
          "_score" : 0.4700036,
          "_source" : {
            "configName" : "Ha Ha",
            "isVisible" : true
          }
        },
        {
          "_index" : "test_index",
          "_type" : "_doc",
          "_id" : "dQSz94EBbpRrdmjWWpzK",
          "_nested" : {
            "field" : "educations.config",
            "offset" : 2
          },
          "_score" : 0.4700036,
          "_source" : {
            "configName" : "Hu Hu",
            "isVisible" : true
          }
        }
      ]
    }
  }
}

For a java solution, check the NestedQueryBuilder returned by QueryBuilders.nestedQuery in order to configure inner_hits there. The following SO post on Elasticsearch inner hits in java api may also help.

You could also combine the two (_source_exclude and inner_hits) to make your response more compact, assuming education is also of nested type:

GET test_index/_search?_source_excludes=educations
{
  "query": {
    "nested": {
      "path": "educations",
      "query": {
        "match": {"educations.config.isVisible": true}
      },
      "inner_hits": {} 
    }
  }
}
  • Related