I am trying to create a script to remove duplicates from a collection using the MongoDB aggregation framework.
Input: some of the documents in the collection
{"_id":{"$oid":"6122ca5df5ee9d5974c6e886"},"Name":"ccc","cusId":1}
{"_id":{"$oid":"6122ca5df5ee9d5974c6e887"},"Name":"ddd","cusId":4}
{"_id":{"$oid":"6122ca5df5ee9d5974c6e888"},"Name":"eee","cusId":5}
{"_id":{"$oid":"612bc88610921de4b5a6bd0d"},"Name":"last","cusId":1}
{"_id":{"$oid":"61571141c8cdee23c9a46a3a"},"Name":"lastRecord","cusId":1}
Script to find duplicates except the last inserted document
$group
{
_id: "$cusId",
mongoIds: { $addToSet: "$_id" },
count: { $sum: 1 }
},$match
{
count: { $gt: 1 }
},$project
{
"duplicates":{
"$filter":{
"input":"$mongoIds",
"as":"mongoId",
"cond":{
"$lt":[
"$$mongoId",
{
"$max":"$mongoIds"
}
]
}
}
}
}
the expected results
the expected output of the script is the array duplicates to contain only documents that are duplicated except the last inserted document to the array.
for instance, for documents with cusId 4 and 5, there are no duplicates for this cusId so they should not be included in the duplicates array. but for documents with cusId 1, there should be 2 duplicates (all documents with cusId 1 except the last inserted document)
so the final result should be all duplicates except the last inserted document
{"_id":{"$oid":"6122ca5df5ee9d5974c6e886"},"Name":"ccc","cusId":1}
{"_id":{"$oid":"612bc88610921de4b5a6bd0d"},"Name":"last","cusId":1}
how to remove duplicates where cusId is the same and keep only the latest inserted document in the collection? How can I get document creation time?
CodePudding user response:
You can do the followings in an aggregation pipeline:
$lookup
self collection with a subpipeline- In the subpipeline, do a
$group
to get the latest timestamp through$toDate : "$_id"
- filter the root document by
$ne
to the timestamp of the$lookup
result (i.e. not the latest document) $out
the result to another collection. You can then use this output collection to prune your original collection through a simple $merge
Here is the Mongo playground for your reference.