In MongoDB I'm trying to filter a collection down to only those documents that contain the most recent date by their respective group.
In traditional SQL I'd do something like:
Select *
From table a
Join (Select my_group, max(date) as max_date
From table group by my_group) b
ON a.my_group = b.my_group AND
a.date = b.max_date
With the following sample collection:
[
{
"_id": "123",
"item1": "group 1",
"item2": "abc",
"item3": "abc",
"date": "2022-01-01"
},
{
"_id": "234",
"item1": "group 1",
"item2": "abc",
"item3": "abc",
"date": "2022-01-02"
},
{
"_id": "345",
"item1": "group 1",
"item2": "abc",
"item3": "abc",
"date": "2022-01-02"
},
{
"_id": "789",
"item1": "group 2",
"item2": "abc",
"item3": "abc",
"date": "2022-01-01"
},
{
"_id": "678",
"item1": "group 2",
"item2": "abc",
"item3": "abc",
"date": "2022-01-02"
},
{
"_id": "456",
"item1": "group 2",
"item2": "abc",
"item3": "abc",
"date": "2022-01-02"
}
]
The expected output is:
[
{
"_id": "234",
"date": "2022-01-02",
"item1": "group 1",
"item2": "abc",
"item3": "abc"
},
{
"_id": "345",
"date": "2022-01-02",
"item1": "group 1",
"item2": "abc",
"item3": "abc"
},
{
"_id": "678",
"date": "2022-01-02",
"item1": "group 2",
"item2": "abc",
"item3": "abc"
},
{
"_id": "456",
"date": "2022-01-02",
"item1": "group 2",
"item2": "abc",
"item3": "abc"
}
]
My current best attempt is:
db.collection.aggregate([
{
$group: {
"_id": "$item1",
"max_date": {
$max: "$date"
},
"records": {
$push: "$$ROOT"
}
}
},
{
"$project": {
items: {
"$filter": {
"input": "$records",
"as": "records",
"cond": {
$eq: [
"$$records.date",
"$max_date"
]
}
}
}
}
},
{
$replaceRoot: {
newRoot: {
results: "$items"
}
}
}
])
Unfortunately this returns the results partitioned by group. I've tried a few alternatives suggested by other posts & get a similar problem, eg:
- How to group and select document corresponding to max within each group in MongoDB?
- MongoDB get rows where max value grouped
- Get all rows, groupped and with max value
Here's a playground example with the query & sample data.
CodePudding user response:
You're close to the answer.
For the last 2 stages:
$unwind
- Deconstruct theitems
array field to multiple documents.$replaceWith
- Replace the output document withitems
document.
db.collection.aggregate([
{
$group: {
"_id": "$item1",
"max_date": {
$max: "$date"
},
"records": {
$push: "$$ROOT"
}
}
},
{
"$project": {
items: {
"$filter": {
"input": "$records",
"as": "records",
"cond": {
$eq: [
"$$records.date",
"$max_date"
]
}
}
}
}
},
{
$unwind: "$items"
},
{
$replaceWith: "$items"
}
])
Bonus
Although the query above is better, also would like to share the MongoDB query that is similar to SQL implementation.
$group
- Group byitem1
and get the max value ofdate
.$lookup
- Self join the collection withitem1
anddate
. And returnsitems
array field.$match
- Filter the document withitems
not an empty array.$unwind
- Deconstruct theitems
array into multiple documents.$replaceWith
- Replace the output document withitems
document.
db.collection.aggregate([
{
$group: {
"_id": "$item1",
"max_date": {
$max: "$date"
}
}
},
{
$lookup: {
from: "collection",
let: {
item1: "$_id",
max_date: "$max_date"
},
pipeline: [
{
$match: {
$expr: {
$and: [
{
$eq: [
"$item1",
"$$item1"
]
},
{
$eq: [
"$date",
"$$max_date"
]
}
]
}
}
}
],
as: "items"
}
},
{
$match: {
items: {
$ne: []
}
}
},
{
$unwind: "$items"
},
{
$replaceWith: "$items"
}
])