Let's say the DB contains tens of thousands of docs having the following structure
{
_id: ObjectId("5ef053e819aaa00013a2bd69"),
approvers: [
{
type: "ONE",
details: {
name: "NameOne",
surname: "SurnameOne"
}
},
{
type: "TWO",
details: {
name: "NameTwo",
surname: "SurnameTwo"
}
},
{
type: "THREE",
// details field is missing
}
]
}
I need to select only such docs where there's no approver of type "TWO" or "ONE" or the approver has missing details
I had an idea to use $not
in a combination with $elemMatch
:
{
$or: [
{
"approvers.type": {
$not: {
$in: ["ONE", "TWO"]
}
}
},
{
approvers: {
$not: {
$elemMatch: {
type: { $in: ["ONE", "TWO"]},
details: {$exists: true}
}
}
}
}
]
}
The query works but it's super ineffective since the index is not used. Based on my understanding, the DB engine has to do full collection scan and in each doc, check all the array elements.
Actually the collection has 75k records and every approvers
array can hold up to 3 elements.
Is there any "trick" to make it more effective or the only option would be to change the data structure ?
CodePudding user response:
This is where separated collection is beneficial
Assuming above are projects, a different structure can be in place
//approvals
[{
_id: ObjectId
projectId: ObjectId // etc the _id in your code
type: "one",
details: "some stuff"
},
{
_id: ObjectId
projectId: ObjectId // etc the _id in your code
type: "two",
details: "some stuff"
},
{
_id: ObjectId
projectId: ObjectId // etc the _id in your code
type: "three",
details: "some stuff"
}]
Then you can can get all the projectId where type $ne ["one", "two"], before retrieving the related projects using $in. This should be achievable via aggregation too though I never tried.