Home > Software engineering >  MongoDB is not picking all keys index to sort and find
MongoDB is not picking all keys index to sort and find

Time:11-30

db.collection.find(
    {
        ctu: "product", pe: "1234567890", pl: "ru-ru", 
        bl: { $elemMatch: { b: "main", l: "ru-ru" } }, 
        ps: false 
    },
    {uid: 1, pd: 1, v: 1, l: 1, mi: 1}
).sort({ updated_at: -1 })
.skip(6150).limit(50)

One Index:

{
    "ctu" : 1,
    "pe" : 1,
    "pl" : 1,
    "bl" : 1,
    "ps" : 1,
    "updated_at" : -1
}

Second index:

{
    "updated_at" : 1 
}

In this, it's picking second one, in spite of having all keys in first and even though first is running fast than second because sort operation is using more than the limit. It is happening in my other collections also. can anyone suggest better index?

I have added the index with all the field in find and sort in sequence. still not working.

CodePudding user response:

It should work. If it doesn't, try specifying the index as hint.

db.collection.find(
    {
        ctu: "product", pe: "1234567890", pl: "ru-ru", 
        bl: { $elemMatch: { b: "main", l: "ru-ru" } }, 
        ps: false 
    },
    {uid: 1, pd: 1, v: 1, l: 1, mi: 1}
).sort({ updated_at: -1})
.skip(6150).limit(50)
.hint({"ctu": 1, "pe": 1, "pl": 1, "bl": 1, "ps": 1, "updated_at": -1})

CodePudding user response:

Suggestion

Change your index definition to use the nested fields of bl directly (and perhaps drop the existing index if it's not being used by anything else):

{
    "ctu" : 1,
    "pe" : 1,
    "pl" : 1,
    "bl.b" : 1,
    "bl.l" : 1,
    "ps" : 1,
    "updated_at" : -1
}

Preamble

Questions like this are best answered with an explain plan which helps us better understand how the database executed a query. In this case I think we can do a bit of investigation/testing with the information provided to figure out what might be happening here.

As @Noel suggests in their answer, it is possible to force index usage via hinting, but I would generally recommend only using that as a last result. It is usually much better to allow the database to do its job to select the most efficient plan to execute a query. The fact that the database is not currently selecting the "obvious" index suggests that something may be wrong.

I attempted to create a playground environment for this, but there appear to be some field reordering issues that impact index creation as well. For reference, that environment was here. You can see though that the bl field gets pulled into the first position which has somewhat of an impact on our analysis. I also had to use an (equivalent) aggregation there which just further complicates things.

In my local (6.0) environment, I created the two indexes from the question. I also inserted a handful of documents, some which match the predicates and some which don't.


Index definition

Given an index on field x (e.g. .createIndex({x: 1})), a query that filters on the entire field can use the index:

test> db.foo.find({x:123}).explain().queryPlanner.winningPlan
{
  stage: 'FETCH',
  inputStage: {
    stage: 'IXSCAN',
    keyPattern: { x: 1 },
    ...
    indexBounds: { x: [ '[123, 123]' ] }
  }
}

However, an index that filters on a nested value inside of x cannot use the index:

test> db.foo.find({'x.y':123}).explain().queryPlanner.winningPlan
{
  stage: 'COLLSCAN',
  filter: { 'x.y': { '$eq': 123 } },
  direction: 'forward'
}

This is manifesting in your particular setup because your index is on the bl field ("bl" : 1) but you are filtering on other fields nested inside of that via the $elemMatch (bl: { $elemMatch: { b: "main", l: "ru-ru" } }). We can see this via the indexBounds for the associated plan in the explain output:

indexBounds: {
  ctu: [ '["product", "product"]' ],
  pe: [ '["1234567890", "1234567890"]' ],
  pl: [ '["ru-ru", "ru-ru"]' ],
  bl: [ '[MinKey, MaxKey]' ],
  ps: [ '[false, false]' ],
  updated_at: [ '[MaxKey, MinKey]' ]
}

Specifically see that the scan on the bl field is not bounded: bl: [ '[MinKey, MaxKey]' ]. Not only does this make the index scan itself less efficient, but it also causes another problem by preventing the index from being used to provide the requested sort. This is observable by the presence of a dedicated 'SORT' stage:

      {
        stage: 'PROJECTION_SIMPLE',
        transformBy: { uid: 1, pd: 1, v: 1, l: 1, mi: 1 },
        inputStage: {
          stage: 'SORT',
          sortPattern: { updated_at: -1 },
          ...
          inputStage: {
            stage: 'FETCH',
            inputStage: {
              stage: 'IXSCAN',
              keyPattern: { ctu: 1, pe: 1, pl: 1, bl: 1, ps: 1, updated_at: -1 },
              ...
            }
            ...
          }
        }
      }

Taken together, both of these inefficiencies result in the optimizer having somewhat of a preference for the single index hence its selection for usage. Modifying the index to replace the bl key with 'bl.b' and 'bl.l' fields resolves all of these issues and should result in the optimizer having a stronger preference for it over the single field index:

{
  stage: 'LIMIT',
  ...
  inputStage: {
    stage: 'PROJECTION_SIMPLE',
    ...
    inputStage: {
      stage: 'FETCH',
      ...
      inputStage: {
        stage: 'IXSCAN',
        indexBounds: {
          ctu: [ '["product", "product"]' ],
          pe: [ '["1234567890", "1234567890"]' ],
          pl: [ '["ru-ru", "ru-ru"]' ],
          'bl.b': [ '["main", "main"]' ],
          'bl.l': [ '["ru-ru", "ru-ru"]' ],
          ps: [ '[false, false]' ],
          updated_at: [ '[MaxKey, MinKey]' ]
        }
        ...
      }
    }
  }
}

Pagination (skip/limit)

I also notice that you appear to be doing pagination:

.skip(6150).limit(50)

This is a very inefficient approach. Each operation will have to scan skip limit number of index keys. Consider other approaches instead.

  • Related