In my PostgresDB, I'm performing a deletion operation using another table as below.
DELETE FROM user_records
USING to_delete_records
WHERE user_records.record_id = to_delete_records.record_id
user_records
table contains around 200 million records while to_delete_records
table contains around 5-10 million records. Everyday the to_delete_records
table is updated with a new set of records, and have to perform the above deletion operation. (similar to deletion, insertion operations (around 5-10 million records) take place as well, hence the total dataset of user_records
remains around 200 million)
Now I'm replacing the PostgresDB with a MongoDB, and following is the script I'm using for deleting records in user_records
collection:
db.to_delete_records.find({}, {_id: 0}).forEach(function(doc){
db.user_records.deleteOne({record_id:doc.record_id});
});
As this is a loop running, seems inefficient.
Is there a better way to delete documents of a collection using another collection in Mongo?
CodePudding user response:
If record_id
is a unique field in both user_records
and to_delete_records
, you can build a unique index for the field for each collection if you have not done so.
db.user_records.createIndex({record_id: 1}, {unique:true});
db.to_delete_records.createIndex({record_id: 1}, {unique:true});
Afterwards, you can use a $merge
statement to add an auxiliary field toDelete
to the collection user_records
, based on the content in to_delete_records
db.to_delete_records.aggregate([
{
"$merge": {
"into": "user_records",
"on": "record_id",
"whenMatched": [
{
$set: {
"toDelete": true
}
}
]
}
}
])
Finally run a deleteMany
on user_records
db.user_records.deleteMany({toDelete: true});