Home > Enterprise >  Can a unique field be a prefix in a compound index?
Can a unique field be a prefix in a compound index?

Time:12-30

Does it make any sense at all, since the each value of the prefix will have only one value of any suffix, there is no further sorting that can be done after the prefix?

CodePudding user response:

Yes, a unique field can be the first one listed in an index.

The index will only be used for queries that include a filter criteria on or sort by that unique field. Such an index could also be useful for creating covered queries that return data from one of the other indexed fields.

CodePudding user response:

Can it be? Sure. Should it be? I'm going to somewhat (but not completely) disagree with @Joe and say "probably not".

For this answer we'll use A to denote the unique field and B to denote any subsequent field(s) of consideration.

Two of the more common reasons to include additional keys in an index overall would be:

  1. To further reduce the amount of excess scanning that the database must do to execute a query.
  2. To satisfy a sort requested in the query shape.

As you note in the question, the second motivation here really doesn't apply. While the database would end up performing a manual sort if there were no compound index and the requested query included a sort on { A: 1, B: 1 }, this can be easily worked around. It is possible that an optimization is added to the database in the future to take this knowledge into consideration, but it is not there today as far as I know. Regardless, logically we know that B does not affect ordering of the results due to the uniqueness of A, so this problem can be "worked around" by having the application specify the simpler (but still logically the same) sort on { A: 1 }.

The former consideration is slightly more interesting. Assuming that A's uniqueness is being enforced by the database, we know two things:

  • There exists a single field index of { A: 1 } that is already eligible for use by queries that filter on A.
  • That existing single field index will scan, at most, one extra document that won't be returned to the client per value of A provided in the query predicate. This is because each value of A will only have a maximum of 1 matching document, but that document could not match the full query predicate based on the values of B.

This leads to an interesting consideration point. If the B predicate(s) is highly selective, then the additional compound index could help avoid some document scanning. Again, this is a maximum of 1 excess document scan per value of A. But it comes at the cost of maintaining an entirely separate index that would be redundant if not for the uniqueness constraint of the single field index. In the majority of cases I would doubt that this tradeoff is worthwhile.

If you are enforcing uniqueness of A at the application layer and the associated single field index doesn't exist, then creating the compound index becomes more attractive.

@Joe mentioned one of the specific exceptions that came to my mind as well. If you are only returning the A and B fields to the client, then the compound index would additionally have the advantage of covering the query. In that case you would always be able to avoid fetching and scanning the document when performing the query which may make maintaining this index more attractive.

As is often the case, there is no definitive answer here. Personally I would opt to not have the compound index in the majority of situations, but there would certainly be exceptions.

  • Related