MongoDB - Query same collection twice-CodePudding

I have a posts collection and I want to return the latest posts and featured posts with 1 query.

post document sample

{
"title":"Hello World",
"author":"Bob Paul",
"featured":True,
"published":True,
"created_at":"2019-01-15 10:27:16.354Z"
}

This is what I want returned:

{
"latest": [post1,post2,post3],
"featured": [post7,post10]
}

Seems like you'd have to do 2 match expressions on the same collection. Is that even possible? I was thinking I could extract the featured posts in a different collection and use $unionWith.

CodePudding user response：

Depending on the frequency of this operation and the definitions of the two queries, I would actually suggest sticking with $unionWith mentioned in the original question as opposed to $facet. This is primarily for performance reasons. While it may make the pipeline a little bit less intuitive to read, I suspect that it will perform and scale much better.

The problem with $facet is that it currently has the following behavior:

The $facet stage, and its sub-pipelines, cannot make use of indexes, even if its sub-pipelines use $match or if $facet is the first stage in the pipeline. The $facet stage will always perform a COLLSCAN during execution.

This means that the $facet approach will scan the entire collection every single time this operation executes. This will probably be too slow (and resource intensive) to support the needs of the application.

For the purposes of this answer, let's say that query1 is something like:

[ 
  { '$sort': { created_at: -1 } }, 
  { '$limit': 3 } 
]

And query2:

[ 
  { '$match': { featured: true } } 
]

Obviously your situation may differ, but this is probably reasonably close. In this situation, there would be two important indexes that are relevant:

db.collection.createIndex({created_at:-1})
db.collection.createIndex({featured:1})

Individually, these queries efficiently use their associated indexes:

> db.collection.aggregate(query1).explain().queryPlanner.winningPlan
{
  stage: 'LIMIT',
  limitAmount: 3,
  inputStage: {
    stage: 'FETCH',
    inputStage: {
      stage: 'IXSCAN',
      keyPattern: { created_at: -1 },
      ...
    }
  }
}
> db.collection.aggregate(query2).explain().queryPlanner.winningPlan
{
  stage: 'FETCH',
  inputStage: {
    stage: 'IXSCAN',
    keyPattern: { featured: 1 },
    ...
  }
}

Combined with $facet, however, they cannot:

> db.collection.aggregate([{$facet:{latest:query1, featured:query2}}]).explain().stages[0]['$cursor'].queryPlanner.winningPlan
{ stage: 'COLLSCAN', direction: 'forward' }

Alternatively, we can construct an aggregation that retrieves the same documents via $unionWith as follows:

> let unionWith = query1.concat([{$unionWith:{coll:"collection", pipeline:query2}}])

> unionWith
[
  { '$sort': { created_at: -1 } },
  { '$limit': 3 },
  {
    '$unionWith': {
      coll: 'collection',
      pipeline: [ { '$match': { featured: true } } ]
    }
  }
]

Now the explain shows something very different (many lines removed for brevity/readability):

> db.collection.aggregate(unionWith).explain()
{
  explainVersion: '1',
  stages: [
    {
      '$cursor': {
        queryPlanner: {
          winningPlan: {
            stage: 'LIMIT',
            limitAmount: 3,
            inputStage: {
              stage: 'FETCH',
              inputStage: {
                stage: 'IXSCAN',
                keyPattern: { created_at: -1 },
              }
            }
          },
        },
      },
    },
    {
      '$unionWith': {
        coll: 'collection',
        pipeline: [
          {
            '$cursor': {
              queryPlanner: {
                ...
                winningPlan: {
                  stage: 'FETCH',
                  inputStage: {
                    stage: 'IXSCAN',
                    keyPattern: { featured: 1 },
                  }
                }
              },
            },
          }
        ]
      },
    }
  ]

Both of the nested queries now use the indexes as expected.

If needed, you can append some additional stages to restructure the data into the format originally requested. In fact, we can actually use the $facet stage to do that work. Note that because $facet is no longer at the beginning of the aggregation pipeline it is being fed data from those preceding stages and the warning about a collection scan is no longer relevant.

Based on the incoming data, we can define facet as:

{
  '$facet': {
    latest: [ { '$match': { featured: false } } ],
    featured: [ { '$match': { featured: true } } ]
  }
}

Appending that to the existing aggregation yields the following results:

> db.collection.aggregate(unionWith.concat([facet]))
[
  {
    latest: [
      {
        _id: 'post1',
        featured: false,
        created_at: ISODate("2022-11-29T18:10:38.262Z")
      },
      {
        _id: 'post2',
        featured: false,
        created_at: ISODate("2022-11-29T18:10:35.427Z")
      },
      {
        _id: 'post3',
        featured: false,
        created_at: ISODate("2022-11-29T18:10:32.930Z")
      }
    ],
    featured: [
      {
        _id: 'post10',
        featured: true,
        created_at: ISODate("2022-11-29T18:09:47.367Z")
      },
      {
        _id: 'post7',
        featured: true,
        created_at: ISODate("2022-11-29T18:10:12.927Z")
      }
    ]
  }
]

The final full pipeline in our example is:

[
  { '$sort': { created_at: -1 } },
  { '$limit': 3 },
  {
    '$unionWith': {
      coll: 'collection',
      pipeline: [ { '$match': { featured: true } } ]
    }
  },
  {
    '$facet': {
      latest: [ { '$match': { featured: false } } ],
      featured: [ { '$match': { featured: true } } ]
    }
  }
]

See how it works in this playground example

CodePudding user response：

You can use $facet into an aggregation pipeline to create two outputs on the same query:

db.collection.aggregate([
  {
    "$facet": {
      "latest": [
        {
          // your query 1 here
        }
      ],
      "featured": [
        {
          // your query 2 here
        }
      ]
    }
  }
])

Example here