I have a collection with 300 million documents, each doc has a user_id
field like following:
{
"user_id": "1234567",
// and other fields
}
I want to a list of unique user_ids in the collection, but the following mongo shell command results in an error.
db.collection.aggregate([
{ $group: { _id: null, user_ids: { $addToSet: "$user_id" } } }
], { allowDiskUse: true });
2021-11-23T14:50:28.163 0900 E QUERY [js] uncaught exception: Error: command failed: {
"ok" : 0,
"errmsg" : "Error on remote shard <host>:<port> :: caused by :: BSONObj size: 46032166 (0x2BE6526) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: null",
"code" : 10334,
"codeName" : "BSONObjectTooLarge",
"operationTime" : Timestamp(1637646628, 64),
...
} : aggregate failed :
Why does the error occur even with allowDiskUse: true option? The db version 4.2.16.
CodePudding user response:
You try to insert all unique user_ids in single document , but apparently the size of this document become greater then16MB causing the issue.
CodePudding user response:
distinct
may be more useful
db.collection.distinct( "user_id" )