Home > Net >  Find duplicate and keep the record with latest field value in MongoDB
Find duplicate and keep the record with latest field value in MongoDB

Time:03-04

I am trying to find duplicates and delete old documents and keep the latest document only on the basis of a field value in mongoDB.

Below is how my collection looks like. I would like to find duplicates on the basis of guid and then only keep the document which has the latest revision number.

{
    "_id": {
        "$oid": "1201f1196affd5c74ca0af14"
    },
    "guid": "MEM-bce9",
    "revision": 121,
    "action": "updated"
}
{
    "_id": {
        "$oid": "2201f1196affd5c74ca0afc4"
    },
    "guid": "MEM-bce9",
    "revision": 122,
    "action": "deleted"
}
{
    "_id": {
        "$oid": "3201f1196affd5c74ca0afc4"
    },
    "guid": "MEM-aXt1",
    "revision": 21,
    "action": "created"
}
{
    "_id": {
        "$oid": "4201f1196affd5c74ca0afc4"
    },
    "guid": "MEM-aXt1",
    "revision": 22,
    "action": "updated"
}
{
    "_id": {
        "$oid": "5201f1196affd5c74ca0afc4"
    },
    "guid": "MEM-Mwq0",
    "revision": 121,
    "action": "updated"
}

Expected Output

   {
    "_id": {
        "$oid": "2201f1196affd5c74ca0afc4"
    },
    "guid": "MEM-bce9",
    "revision": 122,
    "action": "deleted"
}
{
    "_id": {
        "$oid": "4201f1196affd5c74ca0afc4"
    },
    "guid": "MEM-aXt1",
    "revision": 22,
    "action": "updated"
}
{
    "_id": {
        "$oid": "5201f1196affd5c74ca0afc4"
    },
    "guid": "MEM-Mwq0",
    "revision": 121,
    "action": "updated"
}

CodePudding user response:

db.collection.aggregate([
  {
    "$match": {}
  },
  {
    "$sort": {
      "revision": -1
    }
  },
  {
    "$group": {
      "_id": "$guid",
      "latest": {
        "$first": "$$ROOT"
      }
    }
  },
  {
    "$replaceWith": "$latest"
  }
])

mongoplayground

CodePudding user response:

Let's take the simplest case In your collection, the _id (as a timestamp) for item with revision x is before revision x 1. In this situation, the natural sort order makes this query work exactly as you need.

[{$group: {
 _id: '$guid',
 items: {
  $push: '$$ROOT'
 }
}}, {$addFields: {
 lastRevision: {
  $last: '$items'
 }
}}]

Let's go through this: You are creating an object where each record in the collection that shares the same guide are arranged as an array. Since the most recent revision is at a greater index than previous revisions, you just pick of the last item of the array.

  • Related