Home > Back-end >  MongoDB: how to aggregate from multiple collections with same aggregation pipeline
MongoDB: how to aggregate from multiple collections with same aggregation pipeline

Time:09-22

I'm trying to get aggregations with same aggregation pipeline including $match and $group operations from multiple collections.

For example,

with a users collection and collections of questions, answers and comments where every document has authorId and created_at field,

db = [
    'users': [{ _id: 123 }, { _id: 456} ],
    'questions': [
        { authorId: ObjectId('123'), createdAt: ISODate('2022-09-01T00:00:00Z') },
        { authorId: ObjectId('456'), createdAt: ISODate('2022-09-05T00:00:00Z') },
    ],

    'answers': [
        { authorId: ObjectId('123'), createdAt: ISODate('2022-09-05T08:00:00Z') },
        { authorId: ObjectId('456'), createdAt: ISODate('2022-09-01T08:00:00Z') },
    ],
    'comments': [
        { authorId: ObjectId('123'), createdAt: ISODate('2022-09-01T16:00:00Z') },
        { authorId: ObjectId('456'), createdAt: ISODate('2022-09-05T16:00:00Z') },
    ],
]

I want to get counts of documents from each collections with created_at between a given range and grouped by authorId. A desired aggregation result may look like below. The _ids here are ObjectIds of documents in users collection.

\\ match: { createdAt: { $gt: ISODate('2022-09-03T00:00:00Z) } }
[
    { _id: ObjectId('123'), questionCount: 0, answerCount: 1, commentCount: 0 }, 
    { _id: ObjectId('456'), questionCount: 1, answerCount: 0, commentCount: 1 }
]

Currently, I am running aggregation below for each collection, combining the results in the backend service. (I am using Spring Data MongoDB Reactive.) This seems very inefficient.

db.collection.aggregate([
    { $match: { 
        created_at: { $gt: ISODate('2022-09-03T00:00:00Z') }
    }},
    { $group : {
        _id: '$authorId',
        count: {$sum: 1}
    }}
])

How can I get the desired result with one aggregation?

I thought $unionWith or $lookup may help but I'm stuck here.

CodePudding user response:

You can try something like this, using $lookup, here we join users, with all the three collections one-by-one, and then calculate the count:

db.users.aggregate([
  {
    "$lookup": {
      "from": "questions",
      "let": {
        id: "$_id"
      },
      "pipeline": [
        {
          "$match": {
            $expr: {
              "$and": [
                {
                  "$gt": [
                    "$createdAt",
                    ISODate("2022-09-03T00:00:00Z")
                  ]
                },
                {
                  "$eq": [
                    "$$id",
                    "$authorId"
                  ]
                }
              ]
            }
          }
        }
      ],
      "as": "questions"
    }
  },
  {
    "$lookup": {
      "from": "answers",
      "let": {
        id: "$_id"
      },
      "pipeline": [
        {
          "$match": {
            $expr: {
              "$and": [
                {
                  "$gt": [
                    "$createdAt",
                    ISODate("2022-09-03T00:00:00Z")
                  ]
                },
                {
                  "$eq": [
                    "$$id",
                    "$authorId"
                  ]
                }
              ]
            }
          }
        }
      ],
      "as": "answers"
    }
  },
  {
    "$lookup": {
      "from": "comments",
      "let": {
        id: "$_id"
      },
      "pipeline": [
        {
          "$match": {
            $expr: {
              "$and": [
                {
                  "$gt": [
                    "$createdAt",
                    ISODate("2022-09-03T00:00:00Z")
                  ]
                },
                {
                  "$eq": [
                    "$$id",
                    "$authorId"
                  ]
                }
              ]
            }
          }
        }
      ],
      "as": "comments"
    }
  },
  {
    "$project": {
      "questionCount": {
        "$size": "$questions"
      },
      "answersCount": {
        "$size": "$answers"
      },
      "commentsCount": {
        "$size": "$comments"
      }
    }
  }
])

Playground link. In the above query, we use pipelined form of $lookup, to perform join on some custom logic. Learn more about $lookup here.

  • Related