I have a posts collection and I want to return the latest posts and featured posts with 1 query.
post document sample
{
"title":"Hello World",
"author":"Bob Paul",
"featured":True,
"published":True,
"created_at":"2019-01-15 10:27:16.354Z"
}
This is what I want returned:
{
"latest": [post1,post2,post3],
"featured": [post7,post10]
}
Seems like you'd have to do 2 match expressions on the same collection. Is that even possible? I was thinking I could extract the featured posts in a different collection and use $unionWith.
CodePudding user response:
Depending on the frequency of this operation and the definitions of the two queries, I would actually suggest sticking with $unionWith
mentioned in the original question as opposed to $facet
. This is primarily for performance reasons. While it may make the pipeline a little bit less intuitive to read, I suspect that it will perform and scale much better.
The problem with $facet
is that it currently has the following behavior:
The
$facet
stage, and its sub-pipelines, cannot make use of indexes, even if its sub-pipelines use$match
or if$facet
is the first stage in the pipeline. The$facet
stage will always perform aCOLLSCAN
during execution.
This means that the $facet
approach will scan the entire collection every single time this operation executes. This will probably be too slow (and resource intensive) to support the needs of the application.
For the purposes of this answer, let's say that query1
is something like:
[
{ '$sort': { created_at: -1 } },
{ '$limit': 3 }
]
And query2
:
[
{ '$match': { featured: true } }
]
Obviously your situation may differ, but this is probably reasonably close. In this situation, there would be two important indexes that are relevant:
db.collection.createIndex({created_at:-1})
db.collection.createIndex({featured:1})
Individually, these queries efficiently use their associated indexes:
> db.collection.aggregate(query1).explain().queryPlanner.winningPlan
{
stage: 'LIMIT',
limitAmount: 3,
inputStage: {
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { created_at: -1 },
...
}
}
}
> db.collection.aggregate(query2).explain().queryPlanner.winningPlan
{
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { featured: 1 },
...
}
}
Combined with $facet
, however, they cannot:
> db.collection.aggregate([{$facet:{latest:query1, featured:query2}}]).explain().stages[0]['$cursor'].queryPlanner.winningPlan
{ stage: 'COLLSCAN', direction: 'forward' }
Alternatively, we can construct an aggregation that retrieves the same documents via $unionWith
as follows:
> let unionWith = query1.concat([{$unionWith:{coll:"collection", pipeline:query2}}])
> unionWith
[
{ '$sort': { created_at: -1 } },
{ '$limit': 3 },
{
'$unionWith': {
coll: 'collection',
pipeline: [ { '$match': { featured: true } } ]
}
}
]
Now the explain
shows something very different (many lines removed for brevity/readability):
> db.collection.aggregate(unionWith).explain()
{
explainVersion: '1',
stages: [
{
'$cursor': {
queryPlanner: {
winningPlan: {
stage: 'LIMIT',
limitAmount: 3,
inputStage: {
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { created_at: -1 },
}
}
},
},
},
},
{
'$unionWith': {
coll: 'collection',
pipeline: [
{
'$cursor': {
queryPlanner: {
...
winningPlan: {
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { featured: 1 },
}
}
},
},
}
]
},
}
]
Both of the nested queries now use the indexes as expected.
If needed, you can append some additional stages to restructure the data into the format originally requested. In fact, we can actually use the $facet
stage to do that work. Note that because $facet
is no longer at the beginning of the aggregation pipeline it is being fed data from those preceding stages and the warning about a collection scan is no longer relevant.
Based on the incoming data, we can define facet
as:
{
'$facet': {
latest: [ { '$match': { featured: false } } ],
featured: [ { '$match': { featured: true } } ]
}
}
Appending that to the existing aggregation yields the following results:
> db.collection.aggregate(unionWith.concat([facet]))
[
{
latest: [
{
_id: 'post1',
featured: false,
created_at: ISODate("2022-11-29T18:10:38.262Z")
},
{
_id: 'post2',
featured: false,
created_at: ISODate("2022-11-29T18:10:35.427Z")
},
{
_id: 'post3',
featured: false,
created_at: ISODate("2022-11-29T18:10:32.930Z")
}
],
featured: [
{
_id: 'post10',
featured: true,
created_at: ISODate("2022-11-29T18:09:47.367Z")
},
{
_id: 'post7',
featured: true,
created_at: ISODate("2022-11-29T18:10:12.927Z")
}
]
}
]
The final full pipeline in our example is:
[
{ '$sort': { created_at: -1 } },
{ '$limit': 3 },
{
'$unionWith': {
coll: 'collection',
pipeline: [ { '$match': { featured: true } } ]
}
},
{
'$facet': {
latest: [ { '$match': { featured: false } } ],
featured: [ { '$match': { featured: true } } ]
}
}
]
See how it works in this playground example
CodePudding user response:
You can use $facet into an aggregation pipeline to create two outputs on the same query:
db.collection.aggregate([
{
"$facet": {
"latest": [
{
// your query 1 here
}
],
"featured": [
{
// your query 2 here
}
]
}
}
])
Example here