db.collection.find(
{
ctu: "product", pe: "1234567890", pl: "ru-ru",
bl: { $elemMatch: { b: "main", l: "ru-ru" } },
ps: false
},
{uid: 1, pd: 1, v: 1, l: 1, mi: 1}
).sort({ updated_at: -1 })
.skip(6150).limit(50)
One Index:
{
"ctu" : 1,
"pe" : 1,
"pl" : 1,
"bl" : 1,
"ps" : 1,
"updated_at" : -1
}
Second index:
{
"updated_at" : 1
}
In this, it's picking second one, in spite of having all keys in first and even though first is running fast than second because sort operation is using more than the limit. It is happening in my other collections also. can anyone suggest better index?
I have added the index with all the field in find and sort in sequence. still not working.
CodePudding user response:
It should work. If it doesn't, try specifying the index as hint.
db.collection.find(
{
ctu: "product", pe: "1234567890", pl: "ru-ru",
bl: { $elemMatch: { b: "main", l: "ru-ru" } },
ps: false
},
{uid: 1, pd: 1, v: 1, l: 1, mi: 1}
).sort({ updated_at: -1})
.skip(6150).limit(50)
.hint({"ctu": 1, "pe": 1, "pl": 1, "bl": 1, "ps": 1, "updated_at": -1})
CodePudding user response:
Suggestion
Change your index definition to use the nested fields of bl
directly (and perhaps drop the existing index if it's not being used by anything else):
{
"ctu" : 1,
"pe" : 1,
"pl" : 1,
"bl.b" : 1,
"bl.l" : 1,
"ps" : 1,
"updated_at" : -1
}
Preamble
Questions like this are best answered with an explain plan which helps us better understand how the database executed a query. In this case I think we can do a bit of investigation/testing with the information provided to figure out what might be happening here.
As @Noel suggests in their answer, it is possible to force index usage via hinting, but I would generally recommend only using that as a last result. It is usually much better to allow the database to do its job to select the most efficient plan to execute a query. The fact that the database is not currently selecting the "obvious" index suggests that something may be wrong.
I attempted to create a playground environment for this, but there appear to be some field reordering issues that impact index creation as well. For reference, that environment was here. You can see though that the bl
field gets pulled into the first position which has somewhat of an impact on our analysis. I also had to use an (equivalent) aggregation there which just further complicates things.
In my local (6.0
) environment, I created the two indexes from the question. I also inserted a handful of documents, some which match the predicates and some which don't.
Index definition
Given an index on field x
(e.g. .createIndex({x: 1})
), a query that filters on the entire field can use the index:
test> db.foo.find({x:123}).explain().queryPlanner.winningPlan
{
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { x: 1 },
...
indexBounds: { x: [ '[123, 123]' ] }
}
}
However, an index that filters on a nested value inside of x
cannot use the index:
test> db.foo.find({'x.y':123}).explain().queryPlanner.winningPlan
{
stage: 'COLLSCAN',
filter: { 'x.y': { '$eq': 123 } },
direction: 'forward'
}
This is manifesting in your particular setup because your index is on the bl
field ("bl" : 1
) but you are filtering on other fields nested inside of that via the $elemMatch
(bl: { $elemMatch: { b: "main", l: "ru-ru" } }
). We can see this via the indexBounds
for the associated plan in the explain output:
indexBounds: {
ctu: [ '["product", "product"]' ],
pe: [ '["1234567890", "1234567890"]' ],
pl: [ '["ru-ru", "ru-ru"]' ],
bl: [ '[MinKey, MaxKey]' ],
ps: [ '[false, false]' ],
updated_at: [ '[MaxKey, MinKey]' ]
}
Specifically see that the scan on the bl
field is not bounded: bl: [ '[MinKey, MaxKey]' ]
. Not only does this make the index scan itself less efficient, but it also causes another problem by preventing the index from being used to provide the requested sort. This is observable by the presence of a dedicated 'SORT'
stage:
{
stage: 'PROJECTION_SIMPLE',
transformBy: { uid: 1, pd: 1, v: 1, l: 1, mi: 1 },
inputStage: {
stage: 'SORT',
sortPattern: { updated_at: -1 },
...
inputStage: {
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { ctu: 1, pe: 1, pl: 1, bl: 1, ps: 1, updated_at: -1 },
...
}
...
}
}
}
Taken together, both of these inefficiencies result in the optimizer having somewhat of a preference for the single index hence its selection for usage. Modifying the index to replace the bl
key with 'bl.b'
and 'bl.l'
fields resolves all of these issues and should result in the optimizer having a stronger preference for it over the single field index:
{
stage: 'LIMIT',
...
inputStage: {
stage: 'PROJECTION_SIMPLE',
...
inputStage: {
stage: 'FETCH',
...
inputStage: {
stage: 'IXSCAN',
indexBounds: {
ctu: [ '["product", "product"]' ],
pe: [ '["1234567890", "1234567890"]' ],
pl: [ '["ru-ru", "ru-ru"]' ],
'bl.b': [ '["main", "main"]' ],
'bl.l': [ '["ru-ru", "ru-ru"]' ],
ps: [ '[false, false]' ],
updated_at: [ '[MaxKey, MinKey]' ]
}
...
}
}
}
}
Pagination (skip
/limit
)
I also notice that you appear to be doing pagination:
.skip(6150).limit(50)
This is a very inefficient approach. Each operation will have to scan skip
limit
number of index keys. Consider other approaches instead.