I am stuck on one of my tasks.
Overview:
- There are some records on elastic search. Which includes information about the candidates and their employment.
- There is a field that stores information about the statuses in which the candidate got submitted.
{
"submittedJobs": [
{
"status": "PendingPM", "jobId": "ABC", ...
},
{
"status": "PendingClient", "jobId": "XYZ", ...
},
{
"status": "PendingPM", "jobId": "WXY", ...
},
...
]
}
I want to write an es query to fetch all the records in which submitted jobs array "only" have "pendingPM" statuses and no other statuses.
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "submittedJobs",
"query": {
"bool": {
"must": [
{
"term": {
"submittedJobs.status.keyword": "PendingPM"
}
}
]
}
}
}
}
]
}
}
I tried this query, and it returns the records which include "pendingPM" along with other statuses - might use contains() logic.
here is the mapping
"submittedJobs": {
"type": "nested",
"properties": {
"statusId": {
"type": "long"
},
"status": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256,
"normalizer": "lowercase_normalizer"
}
}
},
"jobId": {
"type": "keyword"
}
}
}
For example. let's suppose there are two documents
document #1:
{
"submittedJobs": [
{
"status": "PendingPM", "jobId": "ABC", ...
},
{
"status": "PendingClient", "jobId": "XYZ", ...
},
{
"status": "PendingPM", "jobId": "WXY", ...
},
...
]
},
document #2:
{
"submittedJobs": [
{
"status": "PendingPM", "jobId": "ABC", ...
},
{
"status": "PendingPM", "jobId": "WXY", ...
},
...
]
}
Only document #2 should be returned, as the entire array contains only "PendingPM" and no other statuses.
Document #1 will be filtered-out since it includes mixed statuses.
Any help will be appreciated.
CodePudding user response:
You can use inner_hits along with nested query to get only the matched results from the document
Adding a working example
Index Mapping:
{
"mappings": {
"properties": {
"submittedJobs": {
"type": "nested"
}
}
}
}
Search Query:
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "submittedJobs",
"query": {
"bool": {
"must": [
{
"term": {
"submittedJobs.status.keyword": "PendingPM"
}
}
]
}
},
"inner_hits": {}
}
}
]
}
}
}
Search Result would be:
"hits": [
{
"_index": "73062439",
"_id": "1",
"_score": 0.0,
"_source": {
"submittedJobs": [
{
"status": "PendingPM",
"jobId": "ABC"
},
{
"status": "PendingClient",
"jobId": "XYZ"
},
{
"status": "PendingPM",
"jobId": "WXY"
}
]
},
"inner_hits": { // note this
"submittedJobs": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.4700036,
"hits": [
{
"_index": "73062439",
"_id": "1",
"_nested": {
"field": "submittedJobs",
"offset": 0
},
"_score": 0.4700036,
"_source": {
"jobId": "ABC",
"status": "PendingPM"
}
},
{
"_index": "73062439",
"_id": "1",
"_nested": {
"field": "submittedJobs",
"offset": 2
},
"_score": 0.4700036,
"_source": {
"jobId": "WXY",
"status": "PendingPM"
}
}
]
}
}
}
}
]
CodePudding user response:
Try this:
Will be return only document with all item of array with status PendingPM.
{
"query": {
"bool": {
"must_not": [
{
"nested": {
"path": "submittedJobs",
"query": {
"bool": {
"must_not": [
{
"match": {
"submittedJobs.status": {
"query": "PendingPM"
}
}
}
]
}
}
}
}
]
}
}
}