How to filter with pagination efficiently with millions of records in mongodb?-CodePudding

I know there are a LOT of questions regarding this subject. And while most work, they are really poor in performance when there are millions of records.

I have a collection with 10,000,000 records.

At first I was using mongoose paginator v2 and it took around 8s to get each page, with no filtering and 25s when filtering. Fairly decent compared to the other answers I found googling around. Then I read about aggregate (in some question about the same here) and it was a marvel, 7 ms to get each page without filtering, no matter what page it is:

  const pageSize =  req.query.pagesize;
  const currentPage =  req.query.currentpage;

  let recordCount;
  ServiceClass.find().count().then((count) =>{
    recordCount = count;
    ServiceClass.aggregate().skip(currentPage).limit(pageSize).exec().then((documents) => {
      res.status(200).json({
        message: msgGettingRecordsSuccess,
        serviceClasses: documents,
        count: recordCount,
      });
    })
    .catch((error) => {
      res.status(500).json({ message: msgGettingRecordsError });
    });
  }).catch((error) => {
    res.status(500).json({ message: "Error getting record count" });
  });

What I'm having issues with is when filtering. aggregate doesn't really work like find so my conditions are not working. I read the docs about aggregate and tried with [ {$match: {description: {$regex: regex}}} ] inside aggregate as a start but it did not return anything. This is my current working function for filtering and pagination (which takes 25s):

  const pageSize =  req.query.pagesize;
  const currentPage =  req.query.currentpage;

  const filter = req.params.filter;
  const regex = new RegExp(filter, 'i');

  ServiceClass.paginate({
    $or:[
      {code: { $regex: regex }},
      {description: { $regex: regex }},
    ]
  },{limit: pageSize, page: currentPage}).then((documents)=>{
      res.status(200).json({
        message: msgGettingRecordsSuccess,
        serviceClasses: documents
      });
    }).catch((error) => {
    res.status(500).json({ message: "Error getting the records." });
  });

code and description are both indexes. code is a unique index and description is just a normal index. I need to search for documents which contains a string either in code or description field.

What is the most efficient way to filter and paginate when you have millions of records?

CodePudding user response：

The most efficient way to filter and paginate when you have millions of records is to use the MongoDB's built-in pagination and filtering features, such as the skip(), limit(), and $match operators in the aggregate() pipeline.

You can use the skip() operator to skip a certain number of documents, and the limit() operator to limit the number of documents returned. You can also use the $match operator to filter the documents based on certain conditions.

To filter your documents based on the code or description field, you can use the $match operator with the $or operator, like this:

ServiceClass.aggregate([
    { $match: { $or: [{ code: { $regex: regex } }, { description: { $regex: regex } }] } },
    { $skip: currentPage },
    { $limit: pageSize }
])

You can also use the $text operator instead of $regex which will perform more efficiently when you have text search queries.

It's also important to make sure that the relevant fields (code and description) have indexes, as that will greatly speed up the search process.

You might have to adjust the query according to your specific use case and data.

CodePudding user response：

Below code will get the paginated result from the database along with the count of total documents for that particular query simultaneously.

const pageSize =  req.query.pagesize;
const currentPage =  req.query.currentpage;
const skip = currentPage * pageSize - pageSize;
const query = [
    {
      $match: { $or: [{ code: { $regex: regex } }, { description: { $regex: regex } }] },
    },
    {
      $facet: {
        result: [
          {
            $skip: skip,
          },
          {
            $limit: pageSize,
          },
          {
            $project: {
              createdAt: 0,
              updatedAt: 0,
              __v: 0,
            },
          },
        ],
        count: [
          {
            $count: "count",
          },
        ],
      },
    },
    {
      $project: {
        result: 1,
        count: {
          $arrayElemAt: ["$count", 0],
        },
      },
    },
  ];
const result = await ServiceClass.aggregate(query);
console.log(result)
// result is an object with result and count key.

Hope it helps.