Find duplicate name in MongoDB-CodePudding

I'm having a problem in getting the duplicate name in my mongodb to delete duplicates.

{
    "users": [
      {
        "_id": {
          "$oid": "61441890a6566a001623b8ed"
        },
        "name": "Jollibee",
      },
      {
        "_id": {
          "$oid": "61441890a6566a001623b8ed"
        },
        "name": "Jollibee",
      },
      {
        "_id": {
          "$oid": "61441890a6566a001623b8ed"
        },
        "name": "MCDO",
      },
      {
        "_id": {
          "$oid": "61441890a6566a001623b8ed"
        },
        "name": "Burger King",
      },
    ]
  }

I want to show in my output only the duplicate names. which is Jollibee.

tried this approach but it only returns me the count of all the users not the duplicated ones. I want to show 2 Jollibee only.

db.collection.aggregate([
  {
    "$unwind": "$users"
  },
  {
    "$group": {
      "_id": "$_id",
      "count": {
        "$sum": 1
      }
    }
  },
  {
    "$match": {
      "_id": {
        "$ne": null
      },
      "count": {
        "$gt": 1
      }
    }
  }
])

CodePudding user response：

Suppose the documents are:

[
    {
        "_id": {
            "$oid": "6226dd742ef592186422ad1d"
        },
        "name": "Stack test"
    },
    {
        "_id": {
            "$oid": "6226dd7d2ef592186422ad1e"
        },
        "name": "Stack test"
    },
    {
        "_id": {
            "$oid": "6226dd912ef592186422ad1f"
        },
        "name": "Stack test 001"
    }
]

Aggreagtion Query:

db.users.aggregate(
    [
        {
            $group: {
                _id: "$name", 
                names: {$push: "$name"}
            }
        }
    ]
)

Result:

{ 
    _id: 'Stack test', 
    names: [ 'Stack test', 'Stack test' ] 
},
{ 
    _id: 'Stack test 001', 
    names: [ 'Stack test 001' ] 
}

But a better way to do it will be

Aggregation Query:

db.users.aggregate(
    [
        {
            $group: {
                _id: "$name", 
                count: {$sum: 1}
            }
        }
    ]
)

Result:

{ 
    _id: 'Stack test',
    count: 2 
},
{ 
    _id: 'Stack test 001', 
    count: 1 
}

Now, you can iterate through the count and use the name value in _id

CodePudding user response：

since the $unwind step gives you same _id for all documents grouping by _id is not correct. Instead try grouping by users.name

db.collection.aggregate([
  {
    "$unwind": "$users"
  },
  {
    "$group": {
      "_id": "$users.name",
      "count": {
        "$sum": 1
      }
    }
  },
  {
    "$match": {
      "_id": {
        "$ne": null
      },
      "count": {
        "$gt": 1
      }
    }
  }
])

demo