It seems that the more compound index I add to my collection it gets better to some point and then beyond that the more indexes the slower it becomes.
Is this possible? If so why?
EDITED:
I am referring to read queries. not write queries. I am aware that writes will be slower.
CodePudding user response:
This is the case for any sort of index, not just compound indexes.
In MongoDB (and most databases) a lot of operations are sped up by having an index, at the cost of maintaining each index.
Generally speaking this shouldn't slow down things like a find
but it will very much affect insert
and update
as those change the underlying data and thus requires modifying or rebuilding of each index those changes are linked to.
However, even with inserts and updates an index can help speed up those operations as the query engine can find the documents to update quicker.
In the end it very much a balance as the cost to maintain the indexes, and the space they take up ... can if you were to be overzealous (i.e. creating many, many less used indexes) ... counteract their helpfulness.
For a deeper dive into that, I'd suggest these docs:
https://www.mongodb.com/docs/manual/core/data-model-operations/#std-label-data-model-indexes
https://www.mongodb.com/docs/manual/core/index-creation/
CodePudding user response:
I agree with the information that @Justin Jenkins shared in their answer, as there is absolutely write overhead associated with maintaining indexes. I don't think that answer focuses query performance much though which is what I understand this question to be about. I will give some thoughts about that below, though without additional details about the situation it will necessarily be a little generic.
Although indexes absolutely feel magical at times, they are still just a utility that we make available for the database to use when running operations. Ideally it would never be the case that adding an index would slow down the execution of a query, but unfortunately it can in some circumstances. This is not particularly common which is why it is not often an upfront talking point or concern.
Here are some important considerations:
- The database is responsible for figuring out the index(es) that would result in the most efficient execution plan for every arbitrary query that is executed
- Indexes are data structures. They take up space in memory when loaded from disk and must be traversed to be read.
- The server hosting the database only has finite resources. Every time it uses some of those resources to maintain indexes it reduces the amount of resources available to process queries. It also introduces more possibilities for locking, yielding, or other contention to maintain consistency.
If you are observing a sudden and drastic degradation in query performance, I would tend to suspect a problem associated with the first consideration above. Again while not particularly common, it is possible that the increased number of indexes is now preventing the database from finding the optimal plan. This would be most likely if the query contained an $or
operator, but can happen in other situations as well. Be on the lookout for a different index being reported in the winningPlan
of the explain
output for the query. It would usually happen after a specific number of indexes were created and/or if that new index(es) had a particular definition relevant to the query of interest.
A slower and more linear degradation in performance would seem to be for a different reason, such as the second or third items mentioned above. While memory/cache contention can certainly still degrade performance reasonably quickly, you would not see a shift in the query plans with one of these problems. What can happen here instead is now you have two indexes which (for simplicity) take up twice the amount of space now competing for the same limited space in memory. If what is requested exceeds what is available then the database will have to begin reading useful portions of the indexes (and data) into and out of its cache. This overhead can quickly add up and will result in operations now spending more time waiting for their portion of the index to be made available in memory for reading. I would expect a broader portion of queries to be impacted, though more moderately, in this situation.
In any case, the most actionable broad advice we can give would be for you to review and consolidate your existing indexes. There is a little bit of guidance on the topic here in the documentation. The general idea is that the prefix of the index (the keys at the beginning) are the important ones when it comes to usage for queries. Except for a few special circumstances, a single field index on { A: 1 }
is completely redundant if you have a separate compound index on { A: 1, B: 1 }
. Since the latter index can support all of the operations that the former one can, the former one (single field index in this example) should be removed.
Ultimately you may have to make some tradeoffs about which indexes to maintain and there may not be a 'perfect' index present for every single query. That's okay. Sometimes it is better to let one query do a little extra scanning when one of its predicate fields is not indexed as opposed to maintaining an entirely separate index. There is a tradeoff here at some point and, as @Justin Jenkins put it, it's important to go too far and become overzealous when creating indexes.