I have recently shifted to MongoDB
and Mongoose
with Node.js
. And I am wrapping my head around it all coming from SQL
.
I have a collection where documents have a similar structure to the following:
{
name: String
rank: Number
}
Sometimes the name
might be the same, but the rank
will always be different.
I would like to remove all duplicates of name
, but retain the object that has the LOWEST rank
.
For instance, if my collection looked like this:
{
name: "name1"
rank: 3
},
{
name: "name1"
rank: 4
},
{
name: "name1"
rank: 2
}
I would like to remove all objects where name
is the same except for:
{
name: "name1"
rank: 2
}
Is this possible to do with mongoose?
CodePudding user response:
Here is my approach:
const found = await db.collection.aggregate([
{
$group: {
_id: "$name",
minRank: {
$min: "$rank"
}
}
},
])
await db.collection.deleteMany({
$or: found.map(item => ({
name: item._id,
rank: { $ne: item.minRank }
}))
})
Explanation:
From my point of view your solution would result in many unnecessary calls being made, which would result in a terrible time of execution. My solution exactly contains two steps:
- find for each document's property
name
the corresponding lowestrank
available. - delete each document, where the name is equal to one of those names and the rank is not equal to the actual lowest rank found.
Additional notes:
If not already done, you should probably define an index on the name
property of your schema for performance reasons.
CodePudding user response:
Okay, I figured it out using aggregate
:
const duplicates = await collectionName.aggregate([
{
$group: {
_id: "$name",
dups: { $addToSet: "$_id" },
count: { $sum: 1 }
}
},
{
$match: {
count: { $gt: 1 }
}
}
]);
duplicates.forEach(async (item) => {
const duplicate_names = item.dups;
const duplicate_name = await collectionName.find({ _id: { $in: duplicate_names } }).sort({ rank: 1 });
duplicate_name.shift();
duplicate_name.forEach(async (item) => {
await collectionName.deleteOne({ _id: item._id });
});
});