Home > OS >  MongoDB fill missing dates in aggregation pipeline pagination
MongoDB fill missing dates in aggregation pipeline pagination

Time:10-24

I have this pipeline :

    let pipeline = [

      {
        $group: {
          _id: "$date",
          tasks: { $push: "$$ROOT" },
        },
      },
      {
        $sort: { _id: -1 },
      },
      { 
        $skip: skip //4,8,12,16...etc
      },
      { 
        $limit: 4 
      }
    ];

    const aggregationData = await ScheduleTaskModel.aggregate(pipeline);

where i group all "tasks" by date and i get that result :

[
    {
        "_id": "2022-10-21T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "_id": "2022-10-20T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "_id": "2022-10-18T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "_id": "2022-10-16T00:00:00.000Z",
        "tasks": [...tasks with this date]
    }
]

As you can see,i have missing dates in between dates,which is fine,i can just manipulate the result with simple javascript,create an array with all dates between high and low date bound with empty tasks,and fill the dates that also appear in the result.

The problem lies when i want to "paginate" using $skip,if for example skip to the next 4 groups,i have no way to tell if the next date has any documents,and if it has'nt,i end up with something like the following :

//FIRST RESULT WITH FILLED MISSING DATES
[
    {
        "_id": "2022-10-21T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "_id": "2022-10-20T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "_id": "2022-10-19T00:00:00.000Z",
        "tasks": [] //filled manually
    },
    {
        "_id": "2022-10-18T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "_id": "2022-10-17T00:00:00.000Z",
        "tasks": [] //filled manually
    },
    {
        "_id": "2022-10-16T00:00:00.000Z",
        "tasks": [...tasks with this date]
    }
]

//LOST DAYS IN BETWEEN SKIPS

//SECOND RESULT WITH FILLED MISSING DATES
[
    {
        "_id": "2022-10-14T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "_id": "2022-10-13T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "_id": "2022-10-12T00:00:00.000Z",
        "tasks": [] //filled manually
    },
    {
        "_id": "2022-10-11T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "_id": "2022-10-10T00:00:00.000Z",
        "tasks": [] //filled manually
    },
    {
        "_id": "2022-10-09T00:00:00.000Z",
        "tasks": [...tasks with this date]
    }
]

i still bang my head to overcome this,and unfortunately $densify is out of the question since i use a mongo version before this was introduced

CodePudding user response:

If you're using Mongo version 5.1 you can use the new $densify stage, it does exactly what you want, like so:

db.collection.aggregate([
  {
    $group: {
      _id: "$date",
      tasks: {
        $push: "$$ROOT"
      }
    }
  },
  {
    $densify: {
      field: "_id",
      range: {
        step: 1,
        unit: "day",
        bounds: "full"
      }
    }
  },
  {
    $addFields: {
      tasks: {
        $ifNull: [
          "$tasks",
          []
        ]
      }
    }
  },
  {
    $sort: {
      _id: -1
    },
    
  },
  {
     $skip: n
  },
  {
    $limit: 4
  }
])

Mongo Playground

For a lesser Mongo versions this becomes much much harder, while technically possible I recommend against it, here is a toy example of how I achieved it using Mongo version 4.2 syntax, this is not possible to achieve on earlier versions (unless you're willing not to cast the _id field into a date as return the result "date" as a number and then you can drop the $toDate casting).

This pipeline syntax can become much cleaner using date operators like $dateAdd and $dateDiff but these require version 5.0

The issue is You have to group the entire result set in order to iterate over it and manually "fill" it using $reduce and $map, as you can imagine this is very inefficient:

db.collection.aggregate([
  {
    $group: {
      _id: "$date",
      tasks: {
        $push: "$$ROOT"
      }
    }
  },
  {
    $sort: {
      _id: 1
    }
  },
  {
    $group: {
      _id: null,
      roots: {
        $push: "$$ROOT"
      }
    }
  },
  {
    $addFields: {
      roots: {
        $reduce: {
          input: "$roots",
          initialValue: {
            values: [],
            lastDate: null
          },
          in: {
            lastDate: "$$this._id",
            values: {
              $concatArrays: [
                "$$value.values",
                {
                  $map: {
                    input: {
                      $range: [
                        0,
                        {
                          $round: {
                            $divide: [
                              {
                                "$toDouble": {
                                  $subtract: [
                                    "$$this._id",
                                    {
                                      $ifNull: [
                                        "$$value.lastDate",
                                        {
                                          $subtract: [
                                            "$$this._id",
                                            86400000
                                          ]
                                        }
                                      ]
                                    }
                                  ]
                                }
                              },
                              86400000
                            ]
                          }
                        }
                      ]
                    },
                    as: "dayDiff",
                    in: {
                      $cond: [
                        {
                          $eq: [
                            "$$dayDiff",
                            0
                          ]
                        },
                        "$$this",
                        {
                          tasks: [],
                          _id: {
                            $toDate: {
                              $add: [
                                "$$this._id",
                                {
                                  $multiply: [
                                    {
                                      "$multiply": [
                                        86400000,
                                        "$$dayDiff"
                                      ]
                                    },
                                    -1
                                  ]
                                }
                              ]
                            }
                          }
                        }
                      ]
                    }
                  }
                }
              ]
            }
          }
        }
      }
    }
  },
  {
    $unwind: "$roots.values"
  },
  {
    $replaceRoot: {
      newRoot: "$roots.values"
    }
  },
  {
    $sort: {
      _id: -1
    }
  }
])

Mongo Playground

  • Related