Home > OS >  How to improve aggregate pipeline
How to improve aggregate pipeline

Time:10-26

I have pipeline

   [
       {'$match':{templateId:ObjectId('blabla')}},
       {
               "$sort" : {
                   "_id" : 1
               }
           },
           {
               "$facet" : {
                   "paginatedResult" : [
                       {
                           "$skip" : 0
                       },
                       {
                           "$limit" : 100
                       }
                   ],
                   "totalCount" : [
                       {
                           "$count" : "count"
                       }
                   ]
               }
           }
       
       ])

Index:

    "key" : {
        "templateId" : 1,
        "_id" : 1
    }

Collection has 10.6M documents 500k of it is with needed templateId. Aggregate use index

"planSummary" : "IXSCAN { templateId: 1, _id: 1 }",

But the request takes 16 seconds. What i did wrong? How to speed up it?

CodePudding user response:

For start, you should get rid of the $sort operator. The documents are already sorted by _id since the documents are already guaranteed to sorted by the { templateId: 1, _id: 1 } index. The outcome is sorting 500k which are already sorted anyway.

Next, you shouldn't use the $skip approach. For high page numbers you will skip large numbers of documents up to almost 500k (rather index entries, but still).

I suggest an alternative approach:

For the first page, calculate an id you know for sure falls out of the left side of the index. Say, if you know that you don't have entries back dated to 2019 and before, you can use a match operator similar to this:

var pageStart = ObjectId.fromDate(new Date("2020/01/01"))

Then, your match operator should look like this:

{'$match' : {templateId:ObjectId('blabla'), _id: {$gt: pageStart}}}

For the next pages, keep track of the last document of the previous page: if the rightmost document _id is x in a certain page, then pageStart should be x for the next page.

So your pipeline may look like this:

[        
   {'$match' : {templateId:ObjectId('blabla'), _id: {$gt: pageStart}}},
   {
       "$facet" : {
           "paginatedResult" : [
               {
                   "$limit" : 100
               }
           ]
        
       }
   }
]

Note, that now the $skip is missing from the $facet operator as well.

  • Related