Home > Net >  Mongo db aggregation - $push and $slice top results
Mongo db aggregation - $push and $slice top results

Time:08-16

I have the following documents in my db:

{uid: 1, score: 10}
{uid: 2, score: 11}
{uid: 3, score: 1}
{uid: 4, score: 6}
{uid: 5, score: 2}
{uid: 6, score: 3}
{uid: 7, score: 8}
{uid: 8, score: 10}

I want to split them into buckets by score - i.e.:

score uids (bucket name in aggregation)
[0,4) 3,5,6 0
[4,7) 4 4
[7,inf 1,2,7,8 7

For this, I created the following aggregation which works just fine:

db.scores.aggregation(
    [
        {
            $bucket: 
                {
                    groupBy: "$score",
                    boundaries: [0, 4, 7],
                    default: 7,
                    output:
                        {
                            "total": {$sum: 1},
                            "top_frustrated": 
                                {
                                    $push: {
                                        "uid": "$uid", "score": "$score"
                                    }
                                },
                        },
                }
        },
    ]
)

However, I would like to return only the top 3 of every bucket - i.e, buckets 0, 4 should be the same, but bucket 7 should have only uids 1,2,8 returned (as uid 7 has the lowest score) - but to include the total count of documents as well, i.e. output of bucket "7" should look like:

{ "total" : 4, "top_scores" : 
    [
        {"uid" : 2, "score" : 11},
        {"uid" : 1, "score" : 10},
        {"uid" : 8, "score" : 10},
    ]
}

I tried using $addFields with $sortArray and $slice, but it either won't work or return errors.

I can of course use $project but I was wondering if there is a more efficient way.

I am using Amazon DocumentDB.

CodePudding user response:

You can use the $topN accumulator, instead of $push, like this:

db.collection.aggregate([
  {
    "$bucket": {
      "groupBy": "$score",
      "boundaries": [
        0,
        4,
        7
      ],
      "default": 7,
      "output": {
        "total": {
          "$sum": 1
        },
        "top_frustrated": {
          "$topN": {
            "n": 3,
            "sortBy": {
              "score": -1
            },
            "output": {
              "uid": "$uid",
              "score": "$score"
            }
          }
        }
      },  
    }
  },
])

Playground link.

The only catch here is this operator is present in MongoDB 5.2 and above.

For older versions, this will work:

db.collection.aggregate([
  {
    "$sort": {
      score: -1
    }
  },
  {
    $bucket: {
      groupBy: "$score",
      boundaries: [
        0,
        4,
        7
      ],
      default: 7,
      output: {
        "total": {
          $sum: 1
        },
        "top_frustrated": {
          $push: {
            "uid": "$uid",
            "score": "$score"
          }
        },
        
      },
      
    }
  },
  {
    "$project": {
      total: 1,
      top_frustrated: {
        "$slice": [
          "$top_frustrated",
          3
        ]
      }
    }
  }
])

Playground link.

  • Related