Home > Blockchain >  Find duplicate values inside an array of objects in MongoDB in single record
Find duplicate values inside an array of objects in MongoDB in single record

Time:04-07

i'm new making queries in mongo. I have this single record and i need to find the sections with the same link.

My record:

{
    "_id" : ObjectId("1234"),
    "name": "singleRecordInMongo",
    "__v" : 0,
    "sections" : [ 
        {
            "name" : "firstElement",
            "link" : "https://www.test.com/",
            "_id" : ObjectId("624dd0aca5fb565661da1161")
        }, 
        {
            "name" : "secondElement",
            "link" : "https://www.test.com/",
            "_id" : ObjectId("624dd0aca5fb565661da1162")
        }, 
        {
            "name" : "thirdElement",
            "link" : "https://www.other.com",
            "_id" : ObjectId("624dd0aca5fb565661da1163")
        }
   ]
}

Expected result:

    "sections" : [ 
        {
            "times" : 2,
            "link" : "https://www.test.com/"
        }
   ]

I tried something like this but it didn't work

db.getCollection('records').aggregate(
  {$unwind: "$sections"},
  { $project: {_id: '$_id', value: '$sections'} },
  { $group: {
        _id: null, 
        occurances: {$push: {'value': '$link', count: '$count'}}
        }
   }
);

CodePudding user response:

Edit: You can use $group:

db.collection.aggregate([
  {$unwind: "$sections"},
  {
    $group: {
      _id: "$sections.link",
      orig_id: {$first: "$sections._id" },
      count: {$sum: 1 }
    }
  },
  {$match: { "count": {$gt: 1 }}},
  {
    $group: {
      _id: 0,
      sections: {$push: { link: "$_id", count: "$count"}}
    }
  }
])

Like this playground returning:

  {
    "_id": 0,
    "sections": [
      {
        "count": 2,
        "link": "https://www.test.com/"
      }
    ]
  }

CodePudding user response:

For an aggregate operation that uses JavaScript functions with the $function operator, you can use a hash map to keep track of duplicates as follows:

db.records.aggregate([
   { $addFields: {
      sections: {
         $map: {
            input: "$sections",
            in: { times: 1, link: "$$this.link" }
         }
      }
   } },
   { $addFields: {
      sections: {
         $filter: {
            input: {
               $function: {
                  body: function (data) { 
                     const map = {}; 
                     data = data.map((item) => { 
                        if (map[item.link]) {
                           map[item.link].times  = item.times
                        } else { 
                           map[item.link] = item; 
                        }
                        return item; 
                     }); 
                     
                     return data.filter((item) => item !== undefined); 
                  },
                  args: [ "$sections" ],
                  lang: "js"
               }
            },
            cond: { $gt: ["$$this.times", 1] }
         }
      }
   } }
])

Bear in mind

Executing JavaScript inside an aggregation expression may decrease performance. Only use the $function operator if the provided pipeline operators cannot fulfill your application's needs.

Mongo Playground

  • Related