relative noob. I have a MongoDB collection, where it appears one of the keys has been duplicated within a document. The document count is 13603, but aggregating by the key and counting results in 13604. I have run this 3 times, 30 minutes apart, so I know it's not a timing issue. I am trying to find the document with the duplicate key, but don't understand aggregations enough to find it.
I found a similar thread here but I see no solution for finding the "corrupt" document within a collection.
This is NOT a duplicate key across documents or a duplicate document issue; it is a duplicate key within the same document issue. Any help is appreciated.
screen-shot comparing document count to key-aggregation count
CodePudding user response:
It is most probably incorrect:
db.collection.count()
Check this ticket here
Try to count with this:
db.collection.countDocuments({})
The db.collection.count() is just reading the collection count metadata , it is fast but inacurate especially in sharded cluster since sometimes there is orphaned documents not updated in collection metadata , you need to clean orphans and then try again.
From the documentation:
Avoid using the db.collection.count() method without a query predicate since without the query predicate, the method returns results based on the collection's metadata, which may result in an approximate count. In particular,
On a sharded cluster, the resulting count will not correctly filter out orphaned documents. After an unclean shutdown, the count may be incorrect. For counts based on collection metadata, see also collStats pipeline stage with the count option.