Home > Blockchain >  How can I project top 5 counts and sum the rest in MongoDB?
How can I project top 5 counts and sum the rest in MongoDB?

Time:06-08

I have the following documents:

_id: "Team 1"
count: 1200

_id: "Team 2"
count: 1170

_id: "Team 3"
count: 1006

_id: "Team 4"
count: 932

_id: "Team 5"
count: 931

_id: "Team 6"
count: 899

_id: "Team 7"
count: 895

The list is already sorted and everything, I just need to project this as an array of top 5 based on count and then the rest should be summed as 'others'. If possible I'd like to also add the percentage that each element in the list makes up of the full count. Like this:

[
  {"name":"Team 1", "count":1200, "percent":25},
  {"name":"Team 2", "count":1170,"percent":15},
  {"name":"Team 3", "count":1006,"percent":10},
  {"name":"Team 4", "count":932,"percent":5},
  {"name":"Team 5", "count":931,"percent":5},
  {"name":"Other", "count":1794, "percent":40}]
]

CodePudding user response:

Query

  • $setWindowFields to sort and add the sort-rank to each document
  • group by null with 3 accumulators
    • push the first 5 documents unchanged
    • sum the count of the rest (rank>5)
    • total sum
  • $map to divide the counts with the total sum for the 5 top documents, to get the percentage also
  • add also the percentage for the rest of documents
  • unwind and replace the root, with those documents that have count and percentage

Playmongo (put the mouse at the end of each stage to see the stage in and out)

aggregate(
[{"$setWindowFields": 
   {"output": {"rank": {"$rank": {}}}, "sortBy": {"count": -1}}},
 {"$group": 
   {"_id": null,
    "top5": 
     {"$push": {"$cond": [{"$lte": ["$rank", 5]}, "$$ROOT", "$$REMOVE"]}},
    "other": {"$sum": {"$cond": [{"$lte": ["$rank", 5]}, 0, "$count"]}},
    "all": {"$sum": "$count"}}},
 {"$project": 
   {"_id": 0,
    "docs": 
     {"$concatArrays": 
       [{"$map": 
           {"input": "$top5",
            "in": 
             {"name": "$$this._id",
              "count": "$$this.count",
              "percentage": 
               {"$multiply": [{"$divide": ["$$this.count", "$all"]}, 100]}}}},
         [{"name": "other",
            "count": "$other",
            "percentage": 
             {"$multiply": [{"$divide": ["$other", "$all"]}, 100]}}]]}}},
 {"$unwind": "$docs"}, {"$replaceRoot": {"newRoot": "$docs"}}])

CodePudding user response:

If you have mongoDB version 5.0 or higher you can use $setWindowFields like in @Takis nice answer. Otherwise, you can group, $slice and $reduce your way to the answer:

  1. $sort to have the highest count on top and group to put them all in one array called all and to $sum up.
  2. $slice the all array to keep only the top N.
  3. $reduce the top N to sum them up.
  4. Add the others to the top N array with count sum-sum(topN)
  5. $unwind and format
db.collection.aggregate([
  {$sort: {count: -1}},
  {$group: {_id: null, all: {$push: "$$ROOT"}, sum: {$sum: "$count"}}},
  {$project: {_id: null, sum: 1,  res: {$slice: ["$all", 5]}}},
  {$project: {sum: 1,  res: 1, topN: {
        $reduce: {
          input: "$res",
          initialValue: 0,
          in: {$add: ["$$value", "$$this.count"]}
        }
      }
    }
  },
  {
    $project: {_id: 0, sum: 1, res: {
      $concatArrays: [
        [{_id: "other",  count: {$subtract: ["$sum", "$topN"]}}],
          "$res"
        ]
      }
    }
  },
  {$unwind: "$res"},
  {$project: {_id: "$res._id", count: "$res.count",
     percent: {  $round: [{$multiply: 
        [{$divide: ["$res.count", "$sum"]}, 100]}, 0]
      }
    }
  }
])

Playground example

CodePudding user response:

another way to do it using $facet since $setWindowFields only works with mongodb v5 or later

mongoPlayground

db.collection.aggregate([
    { $sort: { count: -1 } },
    {
      "$facet": {
        others: [
          { "$skip": 5 },
          {
            "$group": {
              "_id": "others",
              "count": { "$sum": "$count" }
            }
          }
        ],
        top5: [ { "$limit": 5 } ]
      }
    },
    {
        "$project": { result: { "$concatArrays": [ "$others", "$top5" ] } }
    },
    {
        "$addFields": { totalCount: { "$sum": "$result.count" } }
    },
    { $unwind: "$result" },
    {
      $project: {
        _id: "$result._id",
        count: "$result.count",
        percent: {
          $round: [
            { "$multiply": [ { $divide: [ "$result.count", "$totalCount" ] }, 100 ] },
            0
          ]
        }
      }
    }
  ])
  • Related