Home > OS >  MongoDB Question Aggregation Without Repetition
MongoDB Question Aggregation Without Repetition

Time:12-29

I am learning MongoDB NoSQL and I am stuck in a problem.

Consider these documents:

{
    "_id" : ObjectId("63aad45c008cdce77c2c3f9e"),
    "title" : "The Express",
    "year" : 2008,
    "cast" : "Dennis Quaid",
    "genres" : "Sports"
},

{
    "_id" : ObjectId("63aad45c008cdce77c2c3fa0"),
    "title" : "The Express",
    "year" : 2008,
    "cast" : "Rob Brown",
    "genres" : "Sports"
},

{
    "_id" : ObjectId("63aad45c008cdce77c2c3fa2"),
    "title" : "The Express",
    "year" : 2008,
    "cast" : "Omar Benson Miller",
    "genres" : "Sports"
},

{
    "_id" : ObjectId("63aad45c008cdce77c2c416e"),
    "title" : "Semi-Pro",
    "year" : 2008,
    "cast" : "Will Ferrell",
    "genres" : "Sports"
},

{
    "_id" : ObjectId("63aad45c008cdce77c2c4170"),
    "title" : "Semi-Pro",
    "year" : 2008,
    "cast" : "Woody Harrelson",
    "genres" : "Sports"
},

{
    "_id" : ObjectId("63aad45c008cdce77c2c4172"),
    "title" : "Semi-Pro",
    "year" : 2008,
    "cast" : "André Benjamin",
    "genres" : "Sports"
}

I am trying to group by "year" and "genres", and count all "title" without repetition.

The code that I try is this:

var query1 = {$group: {"_id": { "year": "$year", "genre": "$genres"}, "count": {$sum:1}}}

var stages = [query1]

db.genres.aggregate(stages)

But this is grouping all the documents and the value of "count" that I get is six when I only have two titles different.

I do not know how to get title with no repeat..

The expect output is as follows:

{
    "_id":{
          "year": 2008
          "genre": "Sports"
    },
    "count": 2  
}

However, with the code that I tried, the output is this:

{
    "_id":{
          "year": 2008
          "genre": "Sports"
    },
    "count": 6 
}

This is wrong, because I only have two different titles in the documents.

How can I solve this? How can I get titles without repetition and with this output?

Thanks so much! Whatever you need to ask, do it please... I am really stuck and I want to learn to do it.

CodePudding user response:

I am trying to group by "year" and "genres", and count all "title" without repetition. ... But this is grouping all the documents and the value of "count" that I get is six when I only have two titles different.

It sounds to me like you will need to de-duplicate by title before performing this final count. Assuming that different movies never have the same title, something like this would perform that de-duplication:

db.collection.aggregate([
  {
    $group: {
      _id: "$title",
      year: {
        $first: "$year"
      },
      genre: {
        $first: "genre"
      },
      
    }
  },
  {
    $group: {
      "_id": {
        "year": "$year",
        "genre": "$genres",
        
      },
      "count": {
        $sum: 1
      }
    }
  }
])

The playground demonstration here shows the output is as expected:

[
  {
    "_id": {
      "genre": "Sports",
      "year": 2008
    },
    "count": 2
  }
]

Alternatively you could generate an array with distinct values for the movie titles in your current grouping and then calculate its size afterwards. Again with the same assumption about movie titles from above, something like this:

db.collection.aggregate([
  {
    $group: {
      "_id": {
        "year": "$year",
        "genre": "$genres",
        
      },
      "count": {
        "$addToSet": "$title"
      }
    }
  },
  {
    "$addFields": {
      "count": {
        $size: "$count"
      }
    }
  }
])

Playground demonstration here (with the same output from the previous example).

  • Related