I am starting MongoDB and have problems about how to create a query to filter documents by last date of every distinct name and retrieve the whole document.
I have some data into my collection (students):
{ "_id" : ObjectId("61479d4bc146b1663a8f2b7d"), "city" : "SAO PAULO", "name" : "ANA", "status" : "ACTIVE", "date1" : ISODate("2020-09-01T08:14:30.000Z") }
{ "_id" : ObjectId("61479d88c146b1663a8f2b7e"), "city" : "SAO PAULO", "name" : "MARIA", "status" : "ACTIVE", "date1" : ISODate("2020-08-01T04:16:00.000Z") }
{ "_id" : ObjectId("61479dc2c146b1663a8f2b7f"), "city" : "RIO DE JANEIRO", "name" : "MARIA", "status" : "ACTIVE", "date1" : ISODate("2021-02-01T11:10:00.000Z") }
{ "_id" : ObjectId("61479df1c146b1663a8f2b80"), "city" : "SAO PAULO", "name" : "MARIA", "status" : "INACTIVE", "date1" : ISODate("2021-02-01T11:15:00.000Z") }
{ "_id" : ObjectId("61479e60c146b1663a8f2b81"), "city" : "BRASILIA", "name" : "JOHH", "status" : "ACTIVE", "date1" : ISODate("2021-06-01T01:18:00.000Z") }
I'm creating a query to filter status "ACTIVE" and show only most recent data for each student, showing only "city", "name", "date" and I'm trying this one using $MAX or $LAST into the GROUP:
db.getCollection('students').aggregate([
{ $match: { status: "ACTIVE" } },
{ $group: { _id: { name : "$name"},
date1 : { $max : "$date1" } ,
city : { $max : "$city" } } }
])
The wanted result:
{ "city" : "SAO PAULO", "name" : "ANA", "date1" : ISODate("2020-09-01T08:14:30.000Z") }
{ "city" : "RIO DE JANEIRO", "name" : "MARIA", "date1" : ISODate("2021-02-01T11:10:00.000Z") }
{ "city" : "BRASILIA", "name" : "JOHH", "date1" : ISODate("2021-06-01T01:18:00.000Z") }
But the result is this:
{ "city" : "SAO PAULO", "name" : "ANA", "date1" : ISODate("2020-09-01T08:14:30.000Z") }
{ "city" : "SAO PAULO", "name" : "MARIA", "date1" : ISODate("2021-02-01T11:10:00.000Z") }
{ "city" : "BRASILIA", "name" : "JOHH", "date1" : ISODate("2021-06-01T01:18:00.000Z") }
It is retrieving wrong data. For ANA and JOHN (only one document each) it's ok. But MARIA has three documents and I need to retrieve all data from her document with the $max date and I'm retrieving "city" : "SAO PAULO" rather than "city" : "RIO DE JANEIRO" because operator $MAX is applied for this field too. That is applied for all fields and the GROUP operator does not allow removing the MAX operator.
I don't know to fix it. How to get whole document, filtering by "last date of every distinct name" ?
CodePudding user response:
You can use this aggregation pipeline:
- First
$match
as you have. - Then
$sort
to get desired values in first position. This is used by next stage. - Into
$group
aggregation you get the$first
value (as the document is sorted, the first value will be the desired one). - And last
$project
to get desired output.
db.collection.aggregate([
{
"$match": {
"status": "ACTIVE"
}
},
{
"$sort": {
"date1": -1
}
},
{
"$group": {
"_id": {
"name": "$name"
},
"date1": {
"$first": "$date1"
},
"city": {
"$first": "$city"
}
}
},
{
"$project": {
"_id": 0,
"name": "$_id.name",
"city": 1,
"date1": 1
}
}
])
Example here