Please lemme know why the performance is so poor in aggregation query in my examples Query 1, count of records in my audio_details collection is around 3M and sample records are like :
{
_id : xxx,
status : 'BUY', /* This is indexed field */
active : 't', /* This is indexed field */
created_date2 : "2022-09-23T09:00:00.000Z", /* This is indexed field */
audio_details : [
{id : 123 /* This is indexed field */ , created_date : "2022-XXX", /* Other fields goes here */},
{id : 124 /* This is indexed field */ , created_date : "2022-XXX", /* Other fields goes here */},
...
],
/* Other 60 fields goes here */
}
Query 1: This is very slow (300 s)
db.audio_details.aggregate([{$match : { status : 'BUY',active: 't','audio_history.id' : {$in: [123]}}}, {$sort : {created_date2 : -1}}]);
Query 2: This is very fast (0.5 s)
db.audio_details.find({ status : 'BUY',active: 't','audio_history.id' : {$in: [123]}
}).sort({created_date2 : -1})
Please share why the query 1 is slow Regards Kris
CodePudding user response:
This appears to be a duplicate of why would identical mongo query take much longer via aggregation than via find? We can therefore make the following observations:
- The issue linked from that answer sees to now be fixed. So upgrading to version
4.4
may resolve the issue. - The sample operation that you've shown can be handled using just
find()
(withsort()
). But in the comments you mention that "want to use$sort
in our application". Is there some specific requirement to use the aggregation framework for these particular operations? It seems that you've demonstrated that there is no issue when using the equivalent.find()
. - In either case, you mention "indexed fields" in your question, but don't actually describe what the index definitions are. If these are single field indexes, then you may want to think about how you can restructure them as compound indexes.
Keep in mind that databases, MongoDB included, are usually most effective at using a single index per data source (collection in this situation) per operation. The only compelling reasons to have a single field index on {created_date2: 1}
would be if it is a TTL index or if you are issuing queries where created_date2
is the only or most selective predicate. You should consider dropping such an index (and incorporating that field in a compound index per the third point above) if none of these conditions apply in your situation.