Home > Blockchain >  Constructing compound index issues
Constructing compound index issues

Time:09-23

I am trying to create a compound index but not entirely sure what i should apply it on to get most out of performance.

{
    status: A - this can be one of three values
    reason: B - this can be one of eight values
    indicator: true - boolean flag
    date1: ISO Date - this may or may not be present
    date2: ISO Date - always present
    date3: ISO Date - always present
}

The query is to find any documents of a particular status, indicator set as true and where the reason is not in a list of given reasons, and if date1 is set then any where date1 < today. It is then to return the top result ordered by date2 first, if there are two records with the same date2, then to use date3 as a second order and return that.

I have gone for a compound index of date1:1, reason:1, status: 1. But i'm really not sure if that is right and if I should include the other query fields and the sorted fields in the index too, any help much appreciated.

Thanks,

CodePudding user response:

I would start by saying we don't know the data as well as you, thus you are the only one who can truly determine what the best index order is.

For example if 99% of documents have indicator: true then having that as the first field of your compound index would probably not give the best results, Similarly if date1 is present in the query but the coverage is the entire collection, then again it does not sound like the best match.

Assuming your field data distribution and entropy are somewhat reasonable here is my suggestion:

Build 2 compound indexes, 1 for queries with date and the other for queries without.

The first index will be date1: -1, status: 1, reason: 1, you can also build it as a sparse index as you mentioned not all documents will have the date1 field. the descending index on date1 will allow efficient fetching of documents ( again assuming it is a reasonable query ).

The second index I recommend will either be status: 1, reason: 1 or reason: 1, status: 1, Depending on how these fields are distributed I will choose one.

Try to build an index that will match the least amount of results for your common queries, those will be the most efficient.

  • Related