Home > Back-end >  How can i improve my query speed in MongoDB, NodeJS?
How can i improve my query speed in MongoDB, NodeJS?

Time:10-19

I have one collection who include some value coming from sensor. My collection look like this.

const MainSchema: Schema = new Schema(
  {
    deviceId: {
      type: mongoose.Types.ObjectId,
      required: true,
      ref: 'Device',
    },
    sensorId: {
      type: mongoose.Types.ObjectId,
      default: null,
      ref: 'Sensor',
    },
    value: {
      type: Number,
    },
    date: {
      type: Date,
    },
  },
  {
    versionKey: false,
  }
);

I want to get data from this collection with my endpoint. This collection should has more 300.000 documents. I want to get data from this collection with sensor data. (like name and desc. to "Sensor")

My Sensor Collection:

const Sensor: Schema = new Schema(
  {
    name: {
      type: String,
      required: true,
      min: 3,
    },
    description: {
      type: String,
      default: null,
    },
    type: {
      type: String,
    },
  },
  {
    timestamps: true,
    versionKey: false,
  }
);

I use 2 method for get data from MainSchema. First approach is look like this (Include aggregate):

startDate, endDate and _sensorId are passed by parameter for this functions.

const data= await MainSchema.aggregate([
    {
      $lookup: {
        from: 'Sensor',
        localField: 'sensorId',
        foreignField: '_id',
        as: 'sensorDetail',
      },
    },
    {
      $unwind: '$sensorDetail',
    },
    {
      $match: {
        $and: [
          { sensorId: new Types.ObjectId(_sensorId) },
          {
            date: {
              $gte: new Date(startDate),
              $lt: new Date(endDate),
            },
          },
        ],
      },
    },
    {
      $project: {
        sensorDetail: {
          name: 1,
          description: 1,
        },
        value: 1,
        date: 1,
      },
    },
    {
      $sort: {
        _id: 1,
      },
    },
  ]);

Second approach look like this (Include find and populate):

const data= await MainSchema.find({
    sensorId: _sensorId,
    date: {
      $gte: new Date(startDate),
      $lte: new Date(endDate),
    },
  })
    .lean()
    .sort({ date: 1 })
    .populate('sensorId', { name: 1, description: 1});

Execution time for same data set:

First approach: 25 - 30 second

Second approach: 11 - 15 second

So how can i get this data more faster. Which one is best practise?

And how can i do extras for improve the query speed?

CodePudding user response:

Frist step

Create index for sensorId and date properties in the collection. You can do it by specifying index: true in your model:

const MainSchema: Schema = new Schema(
  {
    deviceId: { type: mongoose.Types.ObjectId, required: true, ref: 'Device' },
    sensorId: { type: mongoose.Types.ObjectId, default: null, ref: 'Sensor', index: true },
    value: { type: Number },
    date: { type: Date, index: true },
  },
  {
    versionKey: false,
  }
);

Second step

Aggregation queries can take leverage of indexes only if you $match is the first pipeline in the query, so you should change the order of the items in your aggregation query:

const data= await MainSchema.aggregate([
    {
      $match: {
        { sensorId: new Types.ObjectId(_sensorId) },
        {
          date: {
            $gte: new Date(startDate),
            $lt: new Date(endDate),
          },
        },
      },
    },
    {
      $lookup: {
        from: 'Sensor',
        localField: 'sensorId',
        foreignField: '_id',
        as: 'sensorDetail',
      },
    },
    {
      $unwind: '$sensorDetail',
    },
    {
      $project: {
        sensorDetail: {
          name: 1,
          description: 1,
        },
        value: 1,
        date: 1,
      },
    },
    {
      $sort: {
        _id: 1,
      },
    },
  ]);

CodePudding user response:

Overall @NeNaD's answer touches on a lot of the important points. What I'm going to say in this one should be considered in addition to that other information.

Index

Just to clarify, the ideal index here would be a compound index of { sensorId: 1, date: 1 }. This index follows the ESR Guidance for index key ordering and will provide the most efficient retrieval of the data according to the query predicates specified in the $match stage.

If the index: true annotation in Mongoose creates two separate single field indexes, then you should go manually create this index in the collection. MongoDB will only use one of those indexes to execute this query which will not be as efficient as using the compound index described above.

Also regarding the existing approach, what is the purpose of the trailing $sort?

If the application (a chart in this situation) does not need sorted results then you should remove this stage entirely. If the client does need sorted results then you should:

  • Move the $sort stage earlier in the pipeline (behind the $match), and
  • Test if including the sort field in the index improves performance.

As written, the $sort is currently a blocking operation which is going to prevent any results from being returned to the client until they are all processed. If you move the $sort stage up and can change it to sort on date (which probably makes sense for sensor data) the it should automatically use the compound index that we mentioned earlier to provide the sort in a non-blocking manner.

Stage Ordering

Ordering of aggregation stages is important, both for semantic purposes as well as for performance reasons. The database itself will attempt to do various things (such as reordering stages) to improve performance so long as it does not logically change the result set in any way. Some of these optimizations are described here. As these are version specific anyway, you can always take a look at the explain plan to get a better indication of what specific changes the database has applied. The fact that performance did not improve when you manually moved the $match to the beginning (which is generally a best practice) could suggest that the database was able to automatically do that on your behalf.

Schema

I'm a little curious about the schema itself. Is there any reason that there are two separate collections here?

My guess is that this is mostly a play at 'normalization' to help reduce data duplication. That is mostly fine, unless you find yourself constantly performing $lookups like this for most of your read operations. You could certainly consider testing what performance (and storage) looks like if you combine them.

Also for this particular operation, would it make sense to just issue two separate queries, one to get the measurements and one to get the sensor data (a single time)? The aggregation matches on sensorId and the value of that field is what is then used to match against the _id field from the other collection. Unless I'm doing the logic wrong, this should be the same data for each of the source documents.

Time Series Collections

Somewhat related to schema, have you looked into using Time Series Collections? I don't know what your specific goals or pain points are, but it seems that you may be working with IoT data. Time Series collections are purpose-built to help handle use cases like that. Might be worth looking into as they may help you achieve your goals with less hassle or overhead.

  • Related