I'm working on an aggregation query for mongoDB to match all of the records in a specific time and then group them with the group_id field. This is my query:
{
$match: {
"created": {
$gte: // Any date possbile
$lt: new Date(Date.now())
}
}
}, {
$group: {
_id: "$group_id",
records: {
$push: "$$ROOT"
}
},
}
}];
However when I run the query for large dataset I get this error
Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in
These are my questions
- Is it safe to set the
allowDiskUse:true
in production database? - Is there any other way to write this query?
CodePudding user response:
- Yes.
- "Not really." If you want to group by the
group_id
and there are a lot of them, then that's that.
See also: Robomongo : Exceeded memory limit for $group
CodePudding user response:
Is it safe to set the allowDiskUse:true in production database?
As long as you don't run the server out of disk space, sure.
Is there any other way to write this query?
Probably.
The first thing to note is that the $group
stage will return a single document per distinct "$group_id"
. Which means that in order to actually return that document to the application, it will need to fit within the 16MB document size limit.
The memory limit for each group formed by the $group stage is 100MB by default. In that case, you will already need to break up the resulting document in order to return it.
For the simple pipeline you show, you could get a similar effect by simply sorting by {group_id:1}
. Each document would be returned separately, but all of the documents with the same group_id
value will be together in the list.
If you really need to have the entire documents in enormous arrays per group_id, you can iterate the aggregation result in the application and form the arrays there.