Home > other >  MongoDB double $group aggregation by date and status
MongoDB double $group aggregation by date and status

Time:02-04

I have several documents that looks like this (minus many other irrelevant fields):

  [{
    status: 'open',
    createdDate: 2021-06-17T09:02:58.325Z
  },
  {
    status: 'declined',
    createdDate: 2021-07-25T09:09:15.851Z
  },
  {
    status: 'declined',
    createdDate: 2021-09-22T09:32:14.958Z
  },
  {
    status: 'open',
    createdDate: 2021-09-02T09:45:26.584Z
  },
  {
    status: 'referral',
    createdDate: 2021-09-05T09:46:02.764Z
  }]

For this subgroup of the collection I want to aggregate the next result:

{
    "2021-06" : { submitted: 1, referral: 0, declined: 0},
    "2021-07" : { submitted: 1, referral: 0, declined: 1},
    "2021-08" : { submitted: 0, referral: 0, declined: 0},
    "2021-09" : { submitted: 3, referral: 1, declined: 1},
}

Submitted are the total documents (open, referral and declined). I tried using $group in a couple of ways but it didn't work out. Any suggestions? Thanks!

CodePudding user response:

Query

  • group by the date as string with only the year and month
  • count 3 accumulators, the first always adds, the second and third add only if they see status open(the second) and status(declined) the third
  • replace root and array to object, to make the data key, and the data as nested document, like in your expected output

Test code here

aggregate(
[{"$group":
  {"_id":{"$dateToString":{"date":"$createdDate", "format":"%Y-%m"}},
   "submitted":{"$sum":1},
   "referral":{"$sum":{"$cond":[{"$eq":["$status", "open"]}, 1, 0]}},
   "declined":
   {"$sum":{"$cond":[{"$eq":["$status", "declined"]}, 1, 0]}}}},
 {"$replaceRoot":
  {"newRoot":
   {"$arrayToObject":
    [[{"k":"$_id",
       "v":
       {"submitted":"$submitted",
        "referral":"$referral",
        "declined":"$declined"}}]]}}}])

CodePudding user response:

The OP asked for a solution where values become keys e.g. 2021-06: {.... In the spirit of good data design I might suggest a simpler pipeline that keeps the date value as a value:

db.foo.aggregate([
    // The SAME as answer above!
    {$group: {
        "_id":{"$dateToString":{"date":"$createdDate", "format":"%Y-%m"}},
        "submitted":{"$sum":1},
        "referral":{"$sum":{"$cond":[{"$eq":["$status", "open"]}, 1, 0]}},
        "declined":{"$sum":{"$cond":[{"$eq":["$status", "declined"]}, 1, 0]}}
    }}
]);

to yield

{ "_id" : "2021-06", "submitted" : 1, "referral" : 1, "declined" : 0 }
{ "_id" : "2021-07", "submitted" : 1, "referral" : 0, "declined" : 1 }
{ "_id" : "2021-09", "submitted" : 3, "referral" : 1, "declined" : 1 }

Now you can sort, filter, etc. without resorting to turning keys into rvals via $objectToArray etc. If it is entirely necessary to have a more descriptive key name than _id (and not causing confusion by keeping _id) then append these stages:

    ,{$addFields: {"date":"$_id"}}
    ,{$unset: "_id"}
  •  Tags:  
  • Related