I need to fast count related documents.
So, I have four collections
groups
{ "_id" : "g1", "name" : "group1" }
{ "_id" : "g2", "name" : "group2" }
courses
{ "_id" : "c1", "name" : "course1", "group_id" : "g1" }
{ "_id" : "c2", "name" : "course2", "group_id" : "g2" }
topics
{ "_id" : "t1", "name" : "top1c11", "course_id" : "c1" }
{ "_id" : "t2", "name" : "top1c12", "course_id" : "c1" }
{ "_id" : "t3", "name" : "top1c21", "course_id" : "c2" }
lessons
{ "_id" : "l1", "name" : "lesson111", "topic_id" : "t1" }
{ "_id" : "l2", "name" : "lesson112", "topic_id" : "t1" }
{ "_id" : "l3", "name" : "lesson121", "topic_id" : "t2" }
{ "_id" : "l4", "name" : "lesson211", "topic_id" : "t3" }
I need count all lessons of the specific group.
I tried to run the following aggregation, but I didn't wait for an response. (But it working for small amount of data)
db.getCollection('lessons').aggregate([
{
"$lookup": {
"from": "topics",
"let": { "topicId": "$topic_id" },
"pipeline": [
{
"$match": { "$expr": { "$eq": [ "$_id", "$$topicId" ] } }
},
{
"$lookup": {
"from": "courses",
"let": { "courseId": "$topic_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$course_id", "$$courseId" ] } } },
],
"as": "course"
},
},
{
"$unwind": "$course"
}
],
"as": "topic"
},
},
{
"$unwind" : "$topic"
},
{
"$match": {
"topic.course.group_id" : "g1"
}
},
{
$group: {
_id: "$course",
"amount": {$sum:1},
}
}
])
I believe this aggregation can be optimized. But I don`t sure that is a good approach to use aggregation framework for such purpose. If so, how can I optimize the aggregation.
Size of collections (test data):
- courses: 30000
- topics: 200000
- lessons: 30000000
Now I use simple nested loops in my code to count lessons. This takes 10 seconds (for 3000 topics of the certain group).
CodePudding user response:
Query1
- not nested lookups (lookup and unwind)
- match the group
- lookup and unwind 3x, last lookup only counts the lessons, and uses pipeline lookup
- group by group
_id
, to get the total lessons
Indexes that you need (all the foreignField)
courses.group_id
topics.course_id
lessons.topic_id
groups.aggregate(
[{"$match":{"_id":"g1"}},
{"$lookup":
{"from":"courses",
"localField":"_id",
"foreignField":"group_id",
"as":"courses"}},
{"$unwind":"$courses"},
{"$lookup":
{"from":"topics",
"localField":"courses._id",
"foreignField":"course_id",
"as":"topics"}},
{"$unwind":"$topics"},
{"$lookup":
{"from":"lessons",
"localField":"topics._id",
"foreignField":"topic_id",
"pipeline":
[{"$group":{"_id":null, "lessons":{"$sum":1}}},
{"$set":{"id":"$_id", "_id":"$$REMOVE"}}],
"as":"lessons"}},
{"$set":
{"lessons":
{"$cond":
[{"$eq":["$lessons", []]}, 0,
{"$arrayElemAt":["$lessons.lessons", 0]}]}}},
{"$group":{"_id":"$_id", "totalLessons":{"$sum":"$lessons"}}}])
Query2
- nested lookups (without unwind)
- code is the same, just nested
groups.aggregate(
[{"$match":{"_id":"g1"}},
{"$lookup":
{"from":"courses",
"localField":"_id",
"foreignField":"group_id",
"pipeline":
[{"$lookup":
{"from":"topics",
"localField":"_id",
"foreignField":"course_id",
"pipeline":
[{"$lookup":
{"from":"lessons",
"localField":"_id",
"foreignField":"topic_id",
"pipeline":
[{"$group":{"_id":null, "lessons":{"$sum":1}}},
{"$set":{"id":"$_id", "_id":"$$REMOVE"}}],
"as":"lessons"}},
{"$set":
{"lessons":
{"$cond":
[{"$eq":["$lessons", []]}, 0,
{"$arrayElemAt":["$lessons.lessons", 0]}]}}}],
"as":"topics"}},
{"$project":
{"_id":0, "totalLessons":{"$sum":"$topics.lessons"}}}],
"as":"courses"}},
{"$set":
{"courses":"$$REMOVE",
"totalLessons":
{"$cond":
[{"$eq":["$courses", []]}, 0,
{"$arrayElemAt":["$courses.totalLessons", 0]}]}}}])
If you can send some feedback on which one was faster.
If for 1 group its very fast, maybe remove the match, to do it for all groups, or allow from match to pass more many groups.
CodePudding user response:
Solution from comment of Takis. Query1, adopted for 4.2
groups.aggregate(
[{"$match":{"_id":"g1"}},
{"$lookup":
{"from":"courses",
"localField":"_id",
"foreignField":"group_id",
"as":"courses"}},
{"$unwind":"$courses"},
{"$lookup":
{"from":"topics",
"localField":"courses._id",
"foreignField":"course_id",
"as":"topics"}},
{"$unwind":"$topics"},
{"$lookup":
{"from":"lessons",
"pipeline":
[{"$match":{"$expr":{"$eq":["$$ptopic", "$topic_id"]}}},
{"$group":{"_id":null, "lessons":{"$sum":1}}},
{"$set":{"id":"$_id", "_id":"$$REMOVE"}}],
"as":"lessons",
"let":{"ptopic":"$topics._id"}}},
{"$set":
{"lessons":
{"$cond":
[{"$eq":["$lessons", []]}, 0,
{"$arrayElemAt":["$lessons.lessons", 0]}]}}},
{"$group":{"_id":"$_id", "totalLessons":{"$sum":"$lessons"}}}])