We have a MongoDB collection that has duplicate revisions for the same transaction.
Our ask is to cleanup this collection and retain only the latest revision of each transaction. I wrote a script that does remove the duplicates but for some records it does not keep the latest revision. The script is available here for reference: https://gist.github.com/praveenkrjha/1f02ed675e297a2c85e9020e39320a41
Initial record (tid is the transaction id and rv is the revision number):
_id: 61b994a48d546d882462da25, tid: b1def2a8-01e0-4f8c-a859-7cd63287617c, rv: 42
_id: 61b994a48d546d882462da27, tid: b1def2a8-01e0-4f8c-a859-7cd63287617c, rv: 42
_id: 61b994a48d546d882462da29, tid: b1def2a8-01e0-4f8c-a859-7cd63287617c, rv: 42
_id: 61b994a48d546d882462da2b, tid: d9b9e197-6f91-480c-8757-2a24330bc4c5, rv: 21
_id: 61b994a48d546d882462da2d, tid: d9b9e197-6f91-480c-8757-2a24330bc4c5, rv: 22
_id: 61b994a48d546d882462da2f, tid: df0f1252-51d3-4471-be68-a81e8b9a11e2, rv: 31
_id: 61b994a48d546d882462da31, tid: df0f1252-51d3-4471-be68-a81e8b9a11e2, rv: 32
_id: 61b994a48d546d882462da33, tid: bc92c1ee-91b4-4eea-b9ad-752f8dba6c54, rv: 11
_id: 61b994a48d546d882462da35, tid: bc92c1ee-91b4-4eea-b9ad-752f8dba6c54, rv: 12
_id: 61b994a48d546d882462da37, tid: 2a131e9a-07f1-413f-a21f-4f1bbbb3d460, rv: 1
_id: 61b994a48d546d882462da39, tid: e2c7fb60-a1e2-46ea-8d6a-e94e14ded2d3, rv: 1
_id: 61b994a48d546d882462da3b, tid: 314f7cb2-1aad-4add-b2e1-17e910d97713, rv: 1
Final result after clean-up:
_id: 61b994a48d546d882462da25, tid: b1def2a8-01e0-4f8c-a859-7cd63287617c, rv: 42
_id: 61b994a48d546d882462da2b, tid: d9b9e197-6f91-480c-8757-2a24330bc4c5, rv: 21
_id: 61b994a48d546d882462da2f, tid: df0f1252-51d3-4471-be68-a81e8b9a11e2, rv: 31
_id: 61b994a48d546d882462da33, tid: bc92c1ee-91b4-4eea-b9ad-752f8dba6c54, rv: 11
_id: 61b994a48d546d882462da37, tid: 2a131e9a-07f1-413f-a21f-4f1bbbb3d460, rv: 1
_id: 61b994a48d546d882462da39, tid: e2c7fb60-a1e2-46ea-8d6a-e94e14ded2d3, rv: 1
_id: 61b994a48d546d882462da3b, tid: 314f7cb2-1aad-4add-b2e1-17e910d97713, rv: 1
As you can see from final result, it deletes the latest revision for some transaction, which should not be done. Any pointers on how that can be achieved?
CodePudding user response:
Try this one:
db.collection.aggregate([
{ $sort: { rv: -1 } },
{ $group: { _id: "$tid", data: { $first: "$$ROOT" } } },
{ $replaceWith: "$data" }
])
or
db.collection.aggregate([
{ $sort: { rv: 1 } },
{ $group: { _id: "$tid", data: { $last: "$$ROOT" } } },
{ $replaceWith: "$data" }
])