MongoDB - slow query on old documents (aggregation and sorting)-CodePudding

I have two DBs for testing and each contains thousands/hundreds of thousand of documents. But with the same Schemas and CRUD operations.

Let's call DB1 and DB2.

I am using Mongoose Suddenly DB1 became really slow during:

const eventQueryPipeline = [
  {
    $match: {
      $and: [{ userId: req.body.userId }, { serverId: req.body.serverId }],
    },
  },
  {
    $sort: {
      sort: -1,
    },
  },
];

const aggregation = db.collection
  .aggregate(eventQueryPipeline)
  .allowDiskUse(true);
aggregation.exect((err, result) => {
  res.json(result);
});

In DB2 the same exact query runs in milliseconds up to maximum a 10 seconds
In DB1 the query never takes less than 40 seconds.

I do not understand why. What could I be missing? I tried to confront the Documents and the Indexes and they're the same. Deleting the collection and restrting saving the documents, brings the speed back to normal and acceptable, but why is it happening? Does someone had same experience?

CodePudding user response：

Short answer:

You should create following index:

{ "userId": 1, "serverId": 1, "sort": 1 }

Longer answer

Based on your code (i see that you have .allowDiskUse(true)) it looks like mongo is trying to do in memory sort with "a lot" of data. Mongo has by default 100MB system memory limit for sort operations, and you can allow it to use temporary files on disk to store data if it hits that limit. You can read more about it here: https://www.mongodb.com/docs/manual/reference/method/cursor.allowDiskUse/

In order to optimise the performance of your queries, you can use indexes. Common rule that you should follow when planning indexes is ESR (Equality, Sort, Range). You can read more about it here: https://www.mongodb.com/docs/v4.2/tutorial/equality-sort-range-rule/

If we follow that rule while creating our compound index, we will add equality matches first, in your case "userId" and "serverId". After that comes the sort field, in your case "sort".

If you had a need to additionally filter results based on some range (eg. some value greater than X, or timestamp greater than yday), you would add that after the "sort".

That means your index should look like this:

schema.index({ userId: 1, serverId: 1, sort: 1 });

Additionally, you can probably remove allowDiskUse, and handle err inside aggregation.exec callback (im assuming that aggregation.exect is a typo)