Limit number of objects pushed to array in MongoDB aggregation-CodePudding

I've been trying to find a way to limit the number of objects i'm pushing to arrays I'm creating while using "aggregate" on a MongoDB collection. I have a collection of students - each has these relevant keys: class number it takes this semester (only one value), percentile in class (exists if is enrolled in class, null if not), current score in class (> 0 if enrolled in class, else - 0), total average (GPA), max grade

I need to group all students who never failed, per class, in one array that contains those with a GPA higher than 80, and another array containing those without this GPA, sorted by their score in this specific class.

This is my query:

db.getCollection("students").aggregate([
{"$match": {
    "class_number": 
        {"$in": [49, 50, 16]},
    "grades.curr_class.percentile": 
        {"$exists": true},
    "grades.min": {"$gte": 80},
    }},
    {"$sort": {"grades.curr_class.score": -1}},
    {"$group": {"_id": "$class_number", 
                "studentsWithHighGPA": 
                    {"$push": 
                        {"$cond": [{"$gte": ["$grades.gpa", 80]},
                        {"id": "$_id"},
                        "$$REMOVE"]
                        }
                    },
                 "studentsWithoutHighGPA":
                 {"$push": 
                        {"$cond": [{"$lt": ["$grades.gpa", 80]},
                        {"id": "$_id"},
                        "$$REMOVE"]
                        }, 
                    },
                    }, 
                },
])

What i'm trying to do is limit the number of students in each of these arrays. I only want the top 16 in each array, but i'm not sure how to approach this.

Thanks in advance!

I've tried using limit in different variations, and slice too, but none seem to work.

CodePudding user response：

I don't think there is a mongodb-provided operator to apply a limit inside of a $group stage.

You could use $accumulator, but that requires server-side scripting to be enabled, and may have performance impact.

Limiting studentsWithHighGPA to 16 throughout the grouping might look something like:

      "studentsWithHighGPA": {
        "$accumulator": {
          init: "function(){
                     return {combined:[]};
          }",
          accumulate: "function(state, id, score){
                              if (score >= 80) {
                                    state.combined.push({_id:id, score:score})
                              };
                              return {combined:state.combined.slice(0,16)}
          }",
          accumulateArgs: [ "$_id", "$grades.gpa"],
          merge: "function(A,B){
                     return {combined: 
                              A.combined.concat(B.combined).sort(
                                    function(SA,SB){
                                           return (SB.score - SA.score)
                                    })
                            }
          }",
          finalize: "function(s){
               return s.combined.slice(0,16).map(function(A){
                  return {_id:A._id}
               })
          }",
          lang: "js"
        }
      }

Note that the score is also carried through until the very end so that partial result sets from different shards can be combined properly.

CodePudding user response：

Maybe MongoDB $facets are a solution. You can specify different output pipelines in one aggregation call.

Something like this:

const pipeline = [
    {
        '$facet': {
            'studentsWithHighGPA': [
                { '$match': { 'grade': { '$gte': 80 } } }, 
                { '$sort': { 'grade': -1 } }, 
                { '$limit': 16 }
            ], 
            'studentsWithoutHighGPA': [
                { '$match': { 'grade': { '$lt': 80 } } },
                { '$sort': { 'grade': -1 } },
                {  '$limit': 16 }
            ]
        }
    }
];

    
coll.aggregate(pipeline)

This should end up with one document including two arrays.

studentsWithHighGPA (array)
    0 (object)
    1 (object)
    ...
studentsWithoutHighGPA (array)
    0 (object)
    1 (object)

See each facet as an aggregation pipeline on its own. So you can also include $group to group by classes or something else.

https://www.mongodb.com/docs/manual/reference/operator/aggregation/facet/