I am trying to create a compound index but not entirely sure what i should apply it on to get most out of performance.
{
status: A - this can be one of three values
reason: B - this can be one of eight values
indicator: true - boolean flag
date1: ISO Date - this may or may not be present
date2: ISO Date - always present
date3: ISO Date - always present
}
The query is to find any documents of a particular status, indicator set as true and where the reason is not in a list of given reasons, and if date1 is set then any where date1 < today. It is then to return the top result ordered by date2 first, if there are two records with the same date2, then to use date3 as a second order and return that.
I have gone for a compound index of date1:1, reason:1, status: 1. But i'm really not sure if that is right and if I should include the other query fields and the sorted fields in the index too, any help much appreciated.
Thanks,
CodePudding user response:
I would start by saying we don't know the data as well as you, thus you are the only one who can truly determine what the best index order is.
For example if 99% of documents have indicator: true
then having that as the first field of your compound index would probably not give the best results, Similarly if date1
is present in the query but the coverage is the entire collection, then again it does not sound like the best match.
Assuming your field data distribution and entropy are somewhat reasonable here is my suggestion:
Build 2 compound indexes, 1 for queries with date
and the other for queries without.
The first index will be date1: -1, status: 1, reason: 1
, you can also build it as a sparse index as you mentioned not all documents will have the date1
field. the descending index on date1
will allow efficient fetching of documents ( again assuming it is a reasonable query ).
The second index I recommend will either be status: 1, reason: 1
or reason: 1, status: 1
, Depending on how these fields are distributed I will choose one.
Try to build an index that will match the least amount of results for your common queries, those will be the most efficient.