Home > Net >  OVER PARTITION equivalent in MongoDB
OVER PARTITION equivalent in MongoDB

Time:09-23

I've simplified the scenario for brevity.

The initial data:

| EngineerId | FirstName | LastName | BirthdateOn | CupsOfCoffee | HoursOfSleep |
| ---------- | --------- | -------- | ----------- | ------------ | ------------ |
| 1          | John      | Doe      | 1990-01-01  | 5            | 8            |
| 2          | James     | Bond     | 1990-01-01  | 1            | 6            |
| 3          | Leeroy    | Jenkins  | 2000-06-20  | 16           | 10           |
| 4          | Jane      | Doe      | 2000-06-20  | 8            | 2            |
| 5          | Lorem     | Ipsum    | 2010-12-25  | 4            | 5            |
db.engineers.insertMany([
    { FirstName: 'John', LastName: 'Doe', BirthdateOn: ISODate('1990-01-01'), CupsOfCoffee: 5, HoursOfSleep: 8 },
    { FirstName: 'James', LastName: 'Bond', BirthdateOn: ISODate('1990-01-01'), CupsOfCoffee: 1, HoursOfSleep: 6 },
    { FirstName: 'Leeroy', LastName: 'Jenkins', BirthdateOn: ISODate('2000-06-20'), CupsOfCoffee: 16, HoursOfSleep: 10 },
    { FirstName: 'Jane', LastName: 'Doe', BirthdateOn: ISODate('2000-06-20'), CupsOfCoffee: 8, HoursOfSleep: 2 },
    { FirstName: 'Lorem', LastName: 'Ipsum', BirthdateOn: ISODate('2010-12-25'), CupsOfCoffee: 4, HoursOfSleep: 5 }
])

We want to see:

  • the cups of coffee consumed by the engineer
  • the row number sorted descending by cups of coffee
  • the count of engineers with the same birthdate
  • the sum of coffees consumed by engineers with a common birthdate
  • the average hours of sleep for engineers with a common birthdate

The SQL query is:

SELECT
    FirstName,
    LastName,
    BirthdateOn,
    CupsOfCoffee,
    ROW_NUMBER() OVER (PARTITION BY BirthdateOn ORDER BY CupsOfCoffee DESC) AS 'Row Number',
    COUNT(EngineerId) OVER (PARTITION BY BirthdateOn) AS TotalEngineers,
    SUM(CupsOfCoffee) OVER (PARTITION BY BirthdateOn) AS TotalCupsOfCoffee,
    AVG(HoursOfSleep) OVER (PARTITION BY BirthdateOn) AS AvgHoursOfSleep
FROM Engineers

Resulting in the following:

| FirstName | LastName | BirthdateOn | Row Number | CupsOfCoffee | TotalEngineers | TotalCupsOfCoffee | AvgHoursOfSleep |
| --------- | -------- | ----------- | ---------- | ------------ | -------------- | ----------------- | --------------- |
| John      | Doe      | 1990-01-01  | 1          | 5            | 2              | 6                 | 7               |
| James     | Bond     | 1990-01-01  | 2          | 1            | 2              | 6                 | 7               |
| Leeroy    | Jenkins  | 2000-06-20  | 1          | 16           | 2              | 24                | 6               |
| Jane      | Doe      | 2000-06-20  | 2          | 8            | 2              | 24                | 6               |
| Lorem     | Ipsum    | 2010-12-25  | 1          | 4            | 1              | 4                 | 5               |

I've done quite a bit of reading on the MongoDB Aggregate Pipeline, but haven't been able to find a good solution yet. I understand that this is not SQL and the solution might not yield results in this exact format (although that would be amazing). One thing I've considered is combining the results of an aggregate and the collection, but that's either not possible or I've been searching with the wrong terms. $merge looked promising, but AFAIU it would modify the original collection and that's no good.

I've gotten as far as the following, but the results do not include the "row number", cups consumed by specific engineers, or IDs and names of the engineers.

db.engineers.aggregate([
    {
        $group: {
            _id: '$BirthdateOn',
            TotalEngineers: {
                $count: {  }
            },
            TotalCupsOfCoffee: {
                $sum: '$CupsOfCoffee'
            },
            AvgHoursOfSleep: {
                $avg: '$HoursOfSleep'
            }
        }
    }
])

My thought with combining would be to find all of the engineers and then run the aggregate and "join" it to the engineers by BirthdateOn.

Thank you for any help! It's much appreciated.

CodePudding user response:

You did a good start. To get the input data you can use with the $push operator.

Would be this:

db.engineers.aggregate([
  {
    $group: {
      _id: "$BirthdateOn",
      TotalEngineers: { $count: {} },
      TotalCupsOfCoffee: { $sum: "$CupsOfCoffee" },
      AvgHoursOfSleep: { $avg: "$HoursOfSleep" },
      data: { $push: "$$ROOT" }
    }
  }
])

Regarding proper output try:

db.engineers.aggregate([
  {
    $group: {
      _id: "$BirthdateOn",
      TotalEngineers: { $count: {} },
      TotalCupsOfCoffee: { $sum: "$CupsOfCoffee" },
      AvgHoursOfSleep: { $avg: "$HoursOfSleep" },
      data: { $push: "$$ROOT" }
    }
  },
  { $unwind: "$data" },
  { $replaceWith: { $mergeObjects: ["$$ROOT", "$data"] } }
])

Often it is pointless to run $group and afterwards $unwind which basically revert the former operation.

MongoDB version 5.0 introduced the $setWindowFields stage, which is quite similar to the SQL Windowing function:

I think it would be this one:

db.engineers.aggregate([
  {
    $setWindowFields: {
      partitionBy: "$BirthdateOn",
      sortBy: { CupsOfCoffee: 1 },
      output: {
        TotalEngineers: { $count: {} },
        TotalCupsOfCoffee: { $sum: "$CupsOfCoffee" },
        AvgHoursOfSleep: { $avg: "$HoursOfSleep" },
        "Row Number": { $documentNumber: {} }
      }
    }
  }
])
  • Related