Home > OS >  MongoDB How do I reference an array?
MongoDB How do I reference an array?

Time:10-07

I'm trying to create a MongoDB database that contains two collections: Students and Courses.

The first collection "students" contains:

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client.Database

student = [{"_id":"0",
          "firstname":"Bert",
           "lastname":"Holden"},
           {"_id":"1",
            "firstname":"Sam",
            "lastname":"Olsen"},
           {"_id":"2",
            "firstname":"James",
            "lastname":"Swan"}]


students = db.students
students.insert_many(student)
pprint.pprint(students.find_one())

The second collection "courses" contains:

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client.Database

course = [{"_id":"10",
           "coursename":"Databases",
           "grades":"[{student_id:0, grade:83.442}, {student_id:1, grade:45.323}, {student_id:2, grade:87.435}]"}]




courses = db.courses
courses.insert_many(course)
pprint.pprint(courses.find_one())

I then want to use aggregation to find a student and the corresponding courses with grade(s).

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client["Database"]

pipeline = [
    {
        "$lookup": {
            "from": "courses",
            "localField": "_id",
            "foreignField": "student_id",
            "as": "student_course"
            }
    },
    {
        "$match": {
            "_id": "0"
        }
    }
]

pprint.pprint(list(db.students.aggregate(pipeline)))

I'm not sure if the student_id/grade is implemented correctly in the "courses" collection, so that might be one reason why my arregation returns [].

The aggregate works if I create seperate courses for each student, but that seems like a waste of memory, so I would like to have one course with all the student_ids and grades in an array.

Expected output:

 [{'_id': '0',
  'firstname': 'Bert',
  'lastname': 'Holden',
  'student_course': [{'_id': '10',
                      'coursename': 'Databases',
                      'grade': '83.442',
                      'student_id': '0'}]}]

CodePudding user response:

A couple of points worth mentioning..

  1. Your example code in file "courses.py" is inserting grades as a string that represents an array, not an actual array. This was pointed out by Matt in the comments, and you requested an explanation. Here is my attempt to explain - if you insert a string that looks like an array you cannot perform $unwind, or $lookup on sub-elements because they aren't sub-elements, they are part of a string.
  2. You have array data in courses that hold students grades, which are the datapoints that are desired, but you start the aggregation on the student collection. Instead, perhaps change your perspective a bit and come at it from the courses collections instead of the student perspective. If you do, you will may re-qualify the requirement as - "show me all courses and student grades where student id is 0".
  3. Your array data seems to have a datatype mismatch. The student id is an integer in your string variable "array", but the student collection has the student id as a string. Need to be consistent for the $lookup to work properly (if not wanting to perform a bunch of casting).

But, nonetheless, here is a possible solution to your problem. I have revised the python code, including a redefinition of the aggregation...

The name of my test database is pythontest as seen in this code example. This database must exist prior to running the code else an error.

File students.py

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest

student = [{"_id":"0",
          "firstname":"Bert",
           "lastname":"Holden"},
           {"_id":"1",
            "firstname":"Sam",
            "lastname":"Olsen"},
           {"_id":"2",
            "firstname":"James",
            "lastname":"Swan"}]


students = db.students
students.insert_many(student)
pprint.pprint(students.find_one())

Then the courses file. Notice the field grades is no longer a string, but is a valid array object? Notice the student id is a string, and not an integer? (In reality, a stronger datatype such as UUID or int would likely be preferable).

File courses.py

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest

course = [{"_id":"10",
           "coursename":"Databases",
           "grades": [{ "student_id": "0", "grade": 83.442}, {"student_id": "1", "grade": 45.323}, {"student_id": "2", "grade": 87.435}]}]


courses = db.courses
courses.insert_many(course)
pprint.pprint(courses.find_one())

... and finally, the aggregation file with the changed aggregation pipeline...

File aggregation.py

from pymongo import MongoClient
import pprint

client = MongoClient("mongodb://127.0.0.1:27017")
db = client.pythontest

pipeline = [
    { "$match": { "grades.student_id": "0" } },
    { "$unwind": "$grades" },
    { "$project": { "coursename": 1, "student_id": "$grades.student_id", "grade": "$grades.grade" } },
    {
        "$lookup":
        {
            "from": "students",
            "localField": "student_id",
            "foreignField": "_id",
            "as": "student"
        }
    },
    {
        "$unwind": "$student"
    },
    { "$project": { "student._id": 0 } },
    { "$match": { "student_id": "0" } }
]

pprint.pprint(list(db.courses.aggregate(pipeline)))

Output of running program

> python3 aggregation.py
[{'_id': '10',
  'coursename': 'Databases',
  'grade': 83.442,
  'student': {'firstname': 'Bert', 'lastname': 'Holden'},
  'student_id': '0'}]

The format of the data at the end of the program may not be as desired, but can be tweaked by manipulating the aggregation.

** EDIT **

So if you want to approach this aggregation from the student rather than approaching it from the course you can still perform that aggregation, but because the array is in courses the aggregation will be a bit more complicated. The $lookup must utilize a pipeline itself to prepare the foreign data structures:

Aggregation from Student perspective

db.students.aggregate([
{ $match: { _id: "0" } },
{ $addFields: { "colStudents._id": "$_id" } },
{
    $lookup:
    {
        from: "courses",
        let: { varStudentId: "$colStudents._id"},
        pipeline:
        [
            { $unwind: "$grades" },
            { $match: { $expr: { $eq: ["$grades.student_id", "$$varStudentId" ] } } },
            { $project: { course_id: "$_id", coursename: 1, grade: "$grades.grade", _id: 0} }
        ],
        as: "student_course"
    }
},
{ $project: { _id: 0, student_id: "$_id", firstname: 1, lastname: 1, student_course: 1 } }
])

Output

> python3 aggregation.py
[{'firstname': 'Bert',
  'lastname': 'Holden',
  'student_course': [{'course_id': '10',
                      'coursename': 'Databases',
                      'grade': 83.442}],
  'student_id': '0'}]

CodePudding user response:

I was finally able to take a look at this..

TLDR; see Mongo Playground

This solution requires you to store grades as an actual object vs a string.

Consider the following database structure:

db={
  // Collection
  "students": [
    {
      "_id": "0",
      "firstname": "Bert",
      "lastname": "Holden"
    },
    {
      "_id": "1",
      "firstname": "Sam",
      "lastname": "Olsen"
    },
    {
      "_id": "2",
      "firstname": "James",
      "lastname": "Swan"
    }
  ],
  // Collection
  "courses": [
    {
      "_id": "10",
      "coursename": "Databases",
      "grades": [
        {
          student_id: "0",
          grade: 83.442
        },
        {
          student_id: "1",
          grade: 45.325
        },
        {
          student_id: "2",
          grade: 87.435
        }
      ]
    }
  ],
}

You can achieve what you want using the following query:

db.students.aggregate([
  {
    $match: {
      _id: "0"
    }
  },
  {
    $lookup: {
      from: "courses",
      pipeline: [
        {
          $unwind: "$grades"
        },
        {
          $match: {
            "grades.student_id": "0"
          }
        },
        {
          $group: {
            "_id": "$_id",
            "coursename": {
              $first: "$coursename"
            },
            "grade": {
              $first: "$grades.grade"
            },
            "student_id": {
              $first: "$grades.student_id"
            }
          }
        }
      ],
      as: "student_course"
    }
  }
])
  • Related