Home > Mobile >  How do you consistently migrate a large MongoDB collection?
How do you consistently migrate a large MongoDB collection?

Time:03-02

I was trying to migrate a large MongoDB of ~600k documents, like so:

    for await (const doc of db.collection('collection').find({
        legacyProp: { $exists: true },
    })) {
        // additional data fetching from separate collections here
        const newPropValue = await fetchNewPropValue(doc._id)
        await db.collection('collection').findOneAndUpdate({ _id: doc._id }, [{ $set: { newProp: newPropValue } }, { $unset: ['legacyProp'] }])
    }
}

When the migration script finished, data was still being updated for about 30 minutes or so. I've concluded this by computing document count of documents containing legacyProp property:

db.collection.countDocuments({ legacyProp: { $exists: true } })

which was decreasing on subsequent calls. After a while, the updates stopped and the final document count of documents containing legacy prop was around 300k, so the update failed silently resulting in a data loss. I'm curious what exactly happened, and most importantly, how do you update large MongoDB collections without any data loss? Keep in mind, there is additional data fetching involved before every update operation.

CodePudding user response:

My first attempt would be to build function of fetchNewPropValue() in an aggregation pipeline.

Have a look at Aggregation Pipeline Operators

If this is not possible then you can try to put all newPropValue's into array and use it like this. 600k properties should fit easily into your RAM.

const newPropValues = await fetchNewPropValue() // getting all new properties as array [{_id: ..., val: ...}, {_id: ..., val: ...}, ...]
db.getCollection('collection').updateMany(
   { legacyProp: { $exists: true } },
   [
      {
         $set: {
            newProp: {
               $first: {
                  $filter: { input: newPropValues, cond: { $eq: ["$_id", "$$this._id"] } }
               }
            }
         }
      },
      { $set: { legacyProp: "$$REMOVE", newProp: "$$newProp.val" } }
   ]
)

Or you can try bulkWrite:

let bulkOperations = []
db.getCollection('collection').find({ legacyProp: { $exists: true } }).forEach(doc => {
   const newPropValue = await fetchNewPropValue(doc._id);
   bulkOperations.push({
      updateOne: {
         filter: { _id: doc._id },
         update: {
            $set: { newProp: newPropValue },
            $unset: { legacyProp: "" }
         }
      }
   });
   if (bulkOperations.length > 10000) {
      db.getCollection('collection').bulkWrite(bulkOperations, { ordered: false });
      bulkOperations = [];
   }
})
if (bulkOperations.length > 0)
   db.getCollection('collection').bulkWrite(bulkOperations, { ordered: false })
  • Related