Context: I have a MongoDB database with some duplicated documents.
Problem: I want to remove all duplicated documents. (For each duplicated document, I only want to save one, which can be arbitrarily chosen.)
Minimal illustrative example:
The documents all have the following fields (there are also other fields, but those are of no relevance here):
{
"_id": {"$oid":"..."},
"name": "string",
"user": {"$oid":"..."},
}
Duplicated documents: A document is considered duplicated if there are two or more documents with the same "name" and "user" (i.e. the document id is of no relevance here).
How can I remove the duplicated documents?
CodePudding user response:
EDIT:
Since mongoDB version 4.2, one option is to use $group
and $merge
In order to move all unique documents to a new collection:
removeList = db.collection.aggregate([
{
$group: {
_id: {name: "$name", user: "$user"},
doc: {$first: "$$ROOT"}
}
},
{$replaceRoot: {newRoot: "$doc"}},
{$merge: {into: "newCollection"}}
])
See how it works on the playground example
For older version, you do the same using $out
.
Another option is to get a list of all documents to remove and remove them with another query:
db.collection.aggregate([
{
$group: {
_id: {name: "$name", user: "$user"},
doc: {$first: "$$ROOT"},
remove: {$push: "$_id"}
}
},
{
$set: {
remove: {
$filter: {
input: "$remove",
cond: {$ne: ["$$this", "$doc._id"]}
}
}
}
},
{$group: {_id: 0, remove: { $push: "$remove"}}},
{$set: { _id: "$$REMOVE",
remove: {
$reduce: {
input: "$remove",
initialValue: [],
in: {$concatArrays: ["$$value", "$$this"]}
}
}
}
}
])
db.collection.deleteMany({_id: {$in: removeList}})