Home > Blockchain >  MongoDB For each group select the records with the max value
MongoDB For each group select the records with the max value

Time:05-09

In MongoDB I'm trying to filter a collection down to only those documents that contain the most recent date by their respective group.

In traditional SQL I'd do something like:

Select *
From table a
Join (Select my_group, max(date) as max_date
      From table group by my_group) b 
      ON a.my_group = b.my_group AND
         a.date = b.max_date

With the following sample collection:

[
  {
    "_id": "123",
    "item1": "group 1",
    "item2": "abc",
    "item3": "abc",
    "date": "2022-01-01"
  },
  {
    "_id": "234",
    "item1": "group 1",
    "item2": "abc",
    "item3": "abc",
    "date": "2022-01-02"
  },
  {
    "_id": "345",
    "item1": "group 1",
    "item2": "abc",
    "item3": "abc",
    "date": "2022-01-02"
  },
  {
    "_id": "789",
    "item1": "group 2",
    "item2": "abc",
    "item3": "abc",
    "date": "2022-01-01"
  },
  {
    "_id": "678",
    "item1": "group 2",
    "item2": "abc",
    "item3": "abc",
    "date": "2022-01-02"
  },
  {
    "_id": "456",
    "item1": "group 2",
    "item2": "abc",
    "item3": "abc",
    "date": "2022-01-02"
  }
]

The expected output is:

[
    {
        "_id": "234",
        "date": "2022-01-02",
        "item1": "group 1",
        "item2": "abc",
        "item3": "abc"
    },
    {
        "_id": "345",
        "date": "2022-01-02",
        "item1": "group 1",
        "item2": "abc",
        "item3": "abc"
    },
    {
        "_id": "678",
        "date": "2022-01-02",
        "item1": "group 2",
        "item2": "abc",
        "item3": "abc"
    },
    {
        "_id": "456",
        "date": "2022-01-02",
        "item1": "group 2",
        "item2": "abc",
        "item3": "abc"
    }
]

My current best attempt is:

db.collection.aggregate([
  {
    $group: {
      "_id": "$item1",
      "max_date": {
        $max: "$date"
      },
      "records": {
        $push: "$$ROOT"
      }
    }
  },
  {
    "$project": {
      items: {
        "$filter": {
          "input": "$records",
          "as": "records",
          "cond": {
            $eq: [
              "$$records.date",
              "$max_date"
            ]
          }
        }
      }
    }
  },
  {
    $replaceRoot: {
      newRoot: {
        results: "$items"
      }
    }
  }
])

Unfortunately this returns the results partitioned by group. I've tried a few alternatives suggested by other posts & get a similar problem, eg:

Here's a playground example with the query & sample data.

CodePudding user response:

You're close to the answer.

For the last 2 stages:

  1. $unwind - Deconstruct the items array field to multiple documents.

  2. $replaceWith - Replace the output document with items document.

db.collection.aggregate([
  {
    $group: {
      "_id": "$item1",
      "max_date": {
        $max: "$date"
      },
      "records": {
        $push: "$$ROOT"
      }
    }
  },
  {
    "$project": {
      items: {
        "$filter": {
          "input": "$records",
          "as": "records",
          "cond": {
            $eq: [
              "$$records.date",
              "$max_date"
            ]
          }
        }
      }
    }
  },
  {
    $unwind: "$items"
  },
  {
    $replaceWith: "$items"
  }
])

Sample Mongo Playground


Bonus

Although the query above is better, also would like to share the MongoDB query that is similar to SQL implementation.

  1. $group - Group by item1 and get the max value of date.
  2. $lookup - Self join the collection with item1 and date. And returns items array field.
  3. $match - Filter the document with items not an empty array.
  4. $unwind - Deconstruct the items array into multiple documents.
  5. $replaceWith - Replace the output document with items document.
db.collection.aggregate([
  {
    $group: {
      "_id": "$item1",
      "max_date": {
        $max: "$date"
      }
    }
  },
  {
    $lookup: {
      from: "collection",
      let: {
        item1: "$_id",
        max_date: "$max_date"
      },
      pipeline: [
        {
          $match: {
            $expr: {
              $and: [
                {
                  $eq: [
                    "$item1",
                    "$$item1"
                  ]
                },
                {
                  $eq: [
                    "$date",
                    "$$max_date"
                  ]
                }
              ]
            }
          }
        }
      ],
      as: "items"
    }
  },
  {
    $match: {
      items: {
        $ne: []
      }
    }
  },
  {
    $unwind: "$items"
  },
  {
    $replaceWith: "$items"
  }
])

Sample Mongo Playground (Bonus)

  • Related