Home > database >  ElasticSearch: Fetch records from nested Array that "only" include given element/s and fil
ElasticSearch: Fetch records from nested Array that "only" include given element/s and fil

Time:07-22

I am stuck on one of my tasks.

Overview:

  • There are some records on elastic search. Which includes information about the candidates and their employment.
  • There is a field that stores information about the statuses in which the candidate got submitted.
 {
    "submittedJobs": [
        {
            "status": "PendingPM", "jobId": "ABC", ...
        },
        {
            "status": "PendingClient", "jobId": "XYZ", ...
        },
        {
            "status": "PendingPM", "jobId": "WXY", ...
        },
        ...
    ]
}

I want to write an es query to fetch all the records in which submitted jobs array "only" have "pendingPM" statuses and no other statuses.

"query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "path": "submittedJobs",
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "submittedJobs.status.keyword": "PendingPM"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }

I tried this query, and it returns the records which include "pendingPM" along with other statuses - might use contains() logic.

here is the mapping

"submittedJobs": {
    "type": "nested",
    "properties": {
        "statusId": {
            "type": "long"
        },
        "status": {
            "type": "text",
            "fields": {
                "keyword": {
                    "type": "keyword",
                    "ignore_above": 256,
                    "normalizer": "lowercase_normalizer"
                }
            }
        },
        "jobId": {
            "type": "keyword"
        }
    }
}

For example. let's suppose there are two documents

document #1:
{
    "submittedJobs": [
        {
            "status": "PendingPM", "jobId": "ABC", ...
        },
        {
            "status": "PendingClient", "jobId": "XYZ", ...
        },
        {
            "status": "PendingPM", "jobId": "WXY", ...
        },
        ...
    ]
},

document #2:
{
    "submittedJobs": [
        {
            "status": "PendingPM", "jobId": "ABC", ...
        },
        {
            "status": "PendingPM", "jobId": "WXY", ...
        },
        ...
    ]
}

Only document #2 should be returned, as the entire array contains only "PendingPM" and no other statuses.

Document #1 will be filtered-out since it includes mixed statuses.

Any help will be appreciated.

CodePudding user response:

You can use inner_hits along with nested query to get only the matched results from the document

Adding a working example

Index Mapping:

{
    "mappings": {
        "properties": {
            "submittedJobs": {
                "type": "nested"
            }
        }
    }
}

Search Query:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "nested": {
                        "path": "submittedJobs",
                        "query": {
                            "bool": {
                                "must": [
                                    {
                                        "term": {
                                            "submittedJobs.status.keyword": "PendingPM"
                                        }
                                    }
                                ]
                            }
                        },
                        "inner_hits": {}
                    }
                }
            ]
        }
    }
}

Search Result would be:

"hits": [
            {
                "_index": "73062439",
                "_id": "1",
                "_score": 0.0,
                "_source": {
                    "submittedJobs": [
                        {
                            "status": "PendingPM",
                            "jobId": "ABC"
                        },
                        {
                            "status": "PendingClient",
                            "jobId": "XYZ"
                        },
                        {
                            "status": "PendingPM",
                            "jobId": "WXY"
                        }
                    ]
                },
                "inner_hits": {                         // note this
                    "submittedJobs": {
                        "hits": {
                            "total": {
                                "value": 2,
                                "relation": "eq"
                            },
                            "max_score": 0.4700036,
                            "hits": [
                                {
                                    "_index": "73062439",
                                    "_id": "1",
                                    "_nested": {
                                        "field": "submittedJobs",
                                        "offset": 0
                                    },
                                    "_score": 0.4700036,
                                    "_source": {
                                        "jobId": "ABC",
                                        "status": "PendingPM"
                                    }
                                },
                                {
                                    "_index": "73062439",
                                    "_id": "1",
                                    "_nested": {
                                        "field": "submittedJobs",
                                        "offset": 2
                                    },
                                    "_score": 0.4700036,
                                    "_source": {
                                        "jobId": "WXY",
                                        "status": "PendingPM"
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]

CodePudding user response:

Try this:

Will be return only document with all item of array with status PendingPM.

{
  "query": {
    "bool": {
      "must_not": [
        {
          "nested": {
            "path": "submittedJobs",
            "query": {
              "bool": {
                "must_not": [
                  {
                    "match": {
                      "submittedJobs.status": {
                        "query": "PendingPM"
                      }
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}
  • Related