Home > OS >  Mongodb Aggregations - Group by date including condition
Mongodb Aggregations - Group by date including condition

Time:12-06

I have a series of documents gathered by aggregation grouping. This is the result for one document:

{
    "_id": {
        "ip": "79.xxx.xxx.117",
        "myDate": "2022-10-19"
    },
    "date": "2022-10-19",
    "allVisitedPages": [
        {
            "page": "/",
            "time": {
                "time": "2022-10-19T11:35:44.655Z",
                "tz": "-120",
                "_id": "634fe1100a011986b7137da0"
            }
        },
        {
            "page": "/2",
            "time": {
                "time": "2022-10-19T12:14:29.536Z",
                "tz": "-120",
                "_id": "634fea257acb264f23d421f1"
            }
        },
        {
            "page": "/",
            "time": {
                "time": "2022-10-19T15:37:30.002Z",
                "tz": "-120",
                "_id": "634fea266001ea364eeb38ea"
            }
        },
    ],
    "visitedPages": 3,
    "createdAt": "2022-10-19T11:35:44.920Z"
},

I want to get this (in this case 2 documents as the time difference between array position 2 and 3 is greater than 2 hours):

{
    "_id": {
        "ip": "79.xxx.xxx.117",
        "myDate": "2022-10-19"
    },
    "date": "2022-10-19",
    "allVisitedPages": [
        {
            "page": "/",
            "durationInMinutes": "39",
            "time": {
                "time": "2022-10-19T11:35:44.655Z",
                "tz": "-120",
                "_id": "634fe1100a011986b7137da0"
            }
        },
        {
            "page": "/2",
            "durationInMinutes": "2",
            "time": {
                "time": "2022-10-19T12:14:29.536Z",
                "tz": "-120",
                "_id": "634fea257acb264f23d421f1"
            }
        }
    ],
    "visitedPages": 2,
},

{
    "_id": {
        "ip": "79.xxx.xxx.117",
        "myDate": "2022-10-19"
    },
    "date": "2022-10-19",
    "allVisitedPages": [
        {
            "page": "/",
            "durationInMinutes": "2",
            "time": {
                "time": "2022-10-19T15:37:30.002Z",
                "tz": "-120",
                "_id": "634fea266001ea364eeb38ea"
            }
        },
    ],
    "visitedPages": 1,
},

I want to get a new grouping document if the time between an array position and the following array position is greater than 2 hours. On the last array position it show always show "2". I tried $divide and $datediff. But this is not possible on the group stage as it's an unary operator. An approach I tried is to calculate the sum of start and end time by dividing. But how to execute this on an array level on the group stage? Maybe someone could point me in the right direction if possible at all?

CodePudding user response:

You can group and then reduce, but another option is to use $setWindowFields to calculate your grouping index before grouping:

db.collection.aggregate([
  {$setWindowFields: {
      partitionBy: {$concat: ["$ip", "$date"]},
      sortBy: {"time.time": 1},
      output: {prevtime: {
          $push: "$time.time",
          window: {documents: [-1, "current"]}
      }}
  }},
  {$addFields: {
      minutesDiff: {
        $toInt: {
          $dateDiff: {
            startDate: {$first: "$prevtime"},
            endDate: {$last: "$prevtime"},
            unit: "minute"
          }
        }
      }
  }},
  {$addFields: {deltaIndex: {$cond: [{$gt: ["$minutesDiff", 120]}, 1, 0]}}},
  {$setWindowFields: {
      partitionBy: {$concat: ["$ip", "$date"]},
      sortBy: {"time.time": 1},
      output: {
        groupIndex: {
          $sum: "$deltaIndex",
          window: {documents: ["unbounded", "current"]}
        },
        duration: {
          $push: "$minutesDiff",
          window: {documents: ["current", 1]}
        }
      }
    }
  },
  {$set: {
      duration: {
        $cond: [
          {$and: [
              {$eq: [{$size: "$duration"}, 2]},
              {$lte: [{$last: "$duration"}, 120]}
          ]},
          {$last: "$duration"},
          2
        ]
      }
  }},
  {$group: {
      _id: {ip: "$ip", myDate: "$date", groupIndex: "$groupIndex"},
      date: {$first: "$date"},
      allVisitedPages: {$push: {page: "$page", time: "$time", duration: "$duration"}},
      visitedPages: {$sum: 1}
  }},
  {$unset: "_id.groupIndex"}
])

See how it works on the playground example

  • Related