Home > Enterprise >  Efficient way to delete multiple documents in MongoDB
Efficient way to delete multiple documents in MongoDB

Time:11-17

In my PostgresDB, I'm performing a deletion operation using another table as below.

DELETE FROM user_records
USING to_delete_records
WHERE user_records.record_id = to_delete_records.record_id

user_records table contains around 200 million records while to_delete_records table contains around 5-10 million records. Everyday the to_delete_records table is updated with a new set of records, and have to perform the above deletion operation. (similar to deletion, insertion operations (around 5-10 million records) take place as well, hence the total dataset of user_records remains around 200 million)

Now I'm replacing the PostgresDB with a MongoDB, and following is the script I'm using for deleting records in user_records collection:

db.to_delete_records.find({}, {_id: 0}).forEach(function(doc){
    db.user_records.deleteOne({record_id:doc.record_id});  
});

As this is a loop running, seems inefficient.

Is there a better way to delete documents of a collection using another collection in Mongo?

CodePudding user response:

If record_id is a unique field in both user_records and to_delete_records, you can build a unique index for the field for each collection if you have not done so.

db.user_records.createIndex({record_id: 1}, {unique:true});
db.to_delete_records.createIndex({record_id: 1}, {unique:true});

Afterwards, you can use a $merge statement to add an auxiliary field toDelete to the collection user_records, based on the content in to_delete_records

db.to_delete_records.aggregate([
  {
    "$merge": {
      "into": "user_records",
      "on": "record_id",
      "whenMatched": [
        {
          $set: {
            "toDelete": true
          }
        }
      ]
    }
  }
])

Finally run a deleteMany on user_records

db.user_records.deleteMany({toDelete: true});
  • Related