Optimising the Aggregation Pipeline-CodePudding

I have a mongo aggregation pipeline while runs something like following


            {
                $match: {
                    "cohortId": payload.cohortId, "template": req.query.template
                },
            }, {
                $lookup: {
                    from: "messages", localField: "messageId", foreignField: "metaData.messageId", as: "message"
                }
            }, {
                $lookup: {
                    from: "webhook", localField: "messageId", foreignField: "messageId", as: "webhook"
                }
            }, {
                $unwind: {
                    path: "$webhook"
                }
            }, {
                $unwind: {
                    path: "$message"
                }
            },

            {
                $project: {
                    "_id": 0,
                    "messageId": 1,
                    "cohortId": 1,
                    "template": 1,
                    "origin": 1,
                    "WBA_AccountId": 1,
                    "WBA_PhoneId": 1,
                    "clientPhone": "$phone",
                    "messageDirection": "$message.metaData.direction",
                    "messageTime": "$message.metaData.time",
                    "messageSent": "$webhook.status.sentFlag",
                    "messageDelivered": "$webhook.status.deliveredFlag",
                    "messageRead": "$webhook.status.readFlag",
                    "messageFailed": "$webhook.status.failedFlag",
                    "messageFailedReason": "$webhook.status.failedReason",
                    "messageSentTime": "$webhook.status.sentTimestamp",
                    "messageDeliveredTime": "$webhook.status.deliveredTimestamp",
                    "messageReadTime": "$webhook.status.readTimestamp",
                    "messageFailedTime": "$webhook.status.failedTimestamp"
                }
            },

Where the Data in Web hook is as

{
  "_id": {
    "$oid": "6374d59618bff45fa34f08f5"
  },
  "messageId": "<Message ID as String>",
  "conversationExpiry": 1668687660,
  "conversationId": "<Conversation ID as String>",
  "billableFlag": "true",
  "WBA_PhoneId": 1234567890, //masked
  "WBA_AccountId": 1234567890, //masked
  "WBA_DisplayPhone": 1234567890, //masked
  "phone": 1234567890, //masked
  "status": {
    "sentFlag": true,
    "sentTimestamp": 1668601237,
    "deliveredFlag": true,
    "readFlag": true,
    "failedFlag": false,
    "deliveredTimestamp": 1668601238,
    "readTimestamp": 1668601250,
    "failedTimestamp": 0,
    "errorMessage": "",
    "errorCode": ""
  },
  "createdAt": {
    "$date": {
      "$numberLong": "1668601238086"
    }
  },
  "updatedAt": {
    "$date": {
      "$numberLong": "1668601250918"
    }
  }
}

and Audience Data looks like

{
  "_id": {
    "$oid": "635a840b97405992d3cb794d"
  },
  "WBA_AccountId": 1234567890, //masked
  "WBA_PhoneId": 1234567890, //masked
  "messageId": "<Message ID as String>",
  "phone": 1234567890, //masked
  "cohortId": "<String Value, I refer this in Aggregation Pipeline to trigger it>",
  "createdAt": {
    "$date": {
      "$numberLong": "1666876427333"
    }
  },
  "end": "2022-10-27",
  "origin": "Clevertap_API_Campaigns",
  "start": "2022-10-27",
  "template": "<String Value, I refer this in Aggregation Pipeline to trigger it>",
  "updatedAt": {
    "$date": {
      "$numberLong": "1666876427333"
    }
  }
}

Problem

When I hit the Controller to trigger the pipeline, with Cohort and Template, it takes 3 minutes to iterate over 6000ish documents and network call times out.

What can be done to optimise the pipeline?

Am using M10 Atlas Cluster

Edit: Adding Screenshot of Realtime Metrics Cluster Usage

CodePudding user response：

There are 3 opportunities to use indexes in this aggregation pipeline, and currently the database is unable to do so at all.

The first opportunity is with the initial $match on the audiances collection. There are two fields being filtered on, cohortId and template. Both have equality predicates, so the index keys could be in either order:

db.getSiblingDB('Production').audiances.createIndex({ cohortId:1, template:1 })

Similarly, the database can try to use indexes for both of the $lookup operations. Currently it is doing about 10,000 full scans of each collection:

            {
                "$lookup": 
                ...
                "totalDocsExamined": 97236382,
                "collectionScans": 9908,
                "indexesUsed": [],
            },
            {
                "$lookup": 
                ...
                "totalDocsExamined": 95408871,
                "collectionScans": 9906,
                "indexesUsed": [],
            },

You mentioned in the comments that you "had a index like this in the system" and that you "tried creating a text index on metaData.messageId". The latter index is not appropriate as you are not performing a $text search so it cannot be used. And the first index shown in your screenshot is { phone: 1, metadata.messageid: 1 }. This index cannot be used either since the first key (phone) is not part of your query ($lookup).

As $lookup will do a direct comparison, the following two indexes will (drastically) improve the performance and efficiency of the operation:

db.getSiblingDB('Production').messages.createIndex({ "metaData.messageId": 1 })
db.getSiblingDB('Production').webhook.createIndex ({ "messageId": 1 })