I am trying to find duplicates and delete old documents and keep the latest document only on the basis of a field value in mongoDB.
Below is how my collection looks like. I would like to find duplicates on the basis of guid and then only keep the document which has the latest revision number.
{
"_id": {
"$oid": "1201f1196affd5c74ca0af14"
},
"guid": "MEM-bce9",
"revision": 121,
"action": "updated"
}
{
"_id": {
"$oid": "2201f1196affd5c74ca0afc4"
},
"guid": "MEM-bce9",
"revision": 122,
"action": "deleted"
}
{
"_id": {
"$oid": "3201f1196affd5c74ca0afc4"
},
"guid": "MEM-aXt1",
"revision": 21,
"action": "created"
}
{
"_id": {
"$oid": "4201f1196affd5c74ca0afc4"
},
"guid": "MEM-aXt1",
"revision": 22,
"action": "updated"
}
{
"_id": {
"$oid": "5201f1196affd5c74ca0afc4"
},
"guid": "MEM-Mwq0",
"revision": 121,
"action": "updated"
}
Expected Output
{
"_id": {
"$oid": "2201f1196affd5c74ca0afc4"
},
"guid": "MEM-bce9",
"revision": 122,
"action": "deleted"
}
{
"_id": {
"$oid": "4201f1196affd5c74ca0afc4"
},
"guid": "MEM-aXt1",
"revision": 22,
"action": "updated"
}
{
"_id": {
"$oid": "5201f1196affd5c74ca0afc4"
},
"guid": "MEM-Mwq0",
"revision": 121,
"action": "updated"
}
CodePudding user response:
db.collection.aggregate([
{
"$match": {}
},
{
"$sort": {
"revision": -1
}
},
{
"$group": {
"_id": "$guid",
"latest": {
"$first": "$$ROOT"
}
}
},
{
"$replaceWith": "$latest"
}
])
CodePudding user response:
Let's take the simplest case In your collection, the _id (as a timestamp) for item with revision x is before revision x 1. In this situation, the natural sort order makes this query work exactly as you need.
[{$group: {
_id: '$guid',
items: {
$push: '$$ROOT'
}
}}, {$addFields: {
lastRevision: {
$last: '$items'
}
}}]
Let's go through this: You are creating an object where each record in the collection that shares the same guide are arranged as an array. Since the most recent revision is at a greater index than previous revisions, you just pick of the last item of the array.