Home > Net >  Mongodb aggregate distinct with unique and sort
Mongodb aggregate distinct with unique and sort

Time:10-20

I have a Ranks collection with documents which looks like this:

[
    {
       "_id": "1",
       "url": "ex1.com",
       "keyword": "k1",
       "rank": 19,
       "createdAt": "2021-06-02",
       "user": "616c542660d23fc17469b47e"
    },
   {
       "_id": "2",
       "url": "ex1.com",
       "keyword": "k1",
       "rank": 14,
       "createdAt": "2021-06-01",
       "user": "616c542660d23fc17469b47e"
    },
    {
       "_id": "3",
       "url": "ex1.com",
       "keyword": "k2",
       "rank": 8,
       "createdAt": "2021-05-01",
       "user": "616c542660d23fc17469b47e"
    },
    {
       "_id": "4",
       "url": "ex2.com",
       "keyword": "k3",
       "rank": 4,
       "createdAt": "2021-05-01",
       "user": "616c542660d23fc17469b47e"
    }
]

users collection with documents which looks like this:

[
  {
    _id: "616c542660d23fc17469b47e",
    email: "[email protected]"
  }
]

I'm trying to run an aggregation which will return each user object user's data array that grouped by url, each url object has keywords array that includes unique and last (by date) rank keyword

This is what I tried but the query returns all url's keywords, how can i make it return unique and last (by createdAt date) keywords

Rank.aggregate([
            {
                $match: {}
            },
            {
                $lookup: {
                    from: 'users',
                    localField: 'user',
                    foreignField: '_id',
                    as: 'user'
                }
            },
            {
                $project: {
                    user: {
                        $arrayElemAt: ['$user', 0]
                    },
                    url: '$url',
                    keyword: '$keyword',
                    rank: '$rank',
                    createdAt: '$createdAt',
                }
            },
            {
                $sort: {
                    createdAt: -1
                }
            },
            {
                $group: {
                    _id: '$user._id',
                    user: {
                        $first: '$user'
                    },
                    data: {
                        $push: {
                            id: '$_id',
                            url: '$url',
                            keyword: '$keyword',
                            rank: '$rank',
                            createdAt: '$createdAt',
                        }
                    }
                }
            }
        ])

Expected output:

[{
        user: {
            _id: "616c542660d23fc17469b47e",
            email: "[email protected]"
        },
        data: [
            {
                url: "ex1.com",
                keywords: [
                    {
                        keyword: "k1",
                        rank: 19,
                        createdAt: "2021-06-02",
                    },
                    {
                        keyword: "k2",
                        rank: 8,
                        createdAt: "2021-05-01"
                    },
                ]
            },
            {
                url: "ex2.com",
                keywords: [
                    {
                        keyword: "k3",
                        rank: 4,
                        createdAt: "2021-05-01"
                    },
                ]
            }
        ]
    }]

CodePudding user response:

Here it is the solution that I came out with. Playground

Full explanation:

We group by "$url","$user" and "$keyword" to get the unique combinations of this fields. AT this point waht we want is only the unique keywords, but we have to use the user and url fields, becouse we would groupBy those later too.Because we order them by createdAt, if we get the first document it will be the last one created.

{
    "$sort": {
        "createdAt": 1
    }
},
{
    "$group": {
        "_id": [
            "$url",
            "$user",
            "$keyword"
        ],
        "keywords": {
      $first: "$$ROOT"
        }
    }
},

Then we will format this keyword information a bit to group it by url. This step will give us the keywords per URL.

    {
        "$project": {
          "url": "$keywords.url",
          "user": "$keywords.user",
          "keywords": "$keywords",
          "_id": 0
        }
      },
      {
        "$group": {
          "_id": [
            "$user",
            "$url"
          ],
          "data": {
            $push: "$$ROOT"
          }
        }
      },

Finally we will group the URLs by user. Notice that we have grouped by URL and by user in each groupBy in order to not lose those fields.

  {
    "$project": {
      "url": {
        $first: "$data.keywords.url"
      },
      "user": {
        $first: "$data.keywords.user"
      },
      "keywords": "$data.keywords",
      "_id": 0
    }
  },
  {
    "$group": {
      "_id": "$user",
      "data": {
        $push: "$$ROOT"
      }
    }
  },

At this step we have almost all the information we needed grouped together. We would perform a lookUp to get the email from the Users collection and do the final mapping to remove some redundant data.

  {
    $lookup: {
      from: "users",
      localField: "_id",
      foreignField: "_id",
      as: "user"
    }
  },
  {
    "$unwind": "$user"
  },
  {
    "$project": {
      "_id": 0,
      "data.user": 0,
      "data.keywords._id": 0,
      "data.keywords.url": 0,
      "data.keywords.user": 0
    }
  },
  • Related