Home > database >  MongoDb query to Project only if the value exists in Mongodb collection
MongoDb query to Project only if the value exists in Mongodb collection

Time:02-28

I have the following function with Mongodb query.

def get_module_names(client, course_id):
    data = list(client['table']['modules'].aggregate([
        {
            '$match': {
                'courseId': course_id,
            }

        }, {
            '$project': {
                '_id': 0,
                'moduleId': '$id',
                'module_name': '$name',
                'item_code': 1,
            }
        }
    ]))
    return pd.DataFrame(data)

In the collection if the value item_code doesn't exist, The query returns a DataFrame with two columns moduleId and module_name without item_code. However I need the DataFrame to be returned only if all the projected columns exist in the final output. If however item_code value exists only for few objects then the query needs to return a DataFrame with item_code as a column, with Nans on rows with missing item_code value.

Currently , if item_code doesn't exist in the collection I am getting the following output.

  moduleId    module_name
0   vjcluy  team 1 module
1   yktdky   intro module
2   trxryd        hello_x

But I need an empty DataFrame to be returned if item_code value doesn't exist at all in the collection.

If item code value does exists but only for few documents in the collection, then I need it returned with Nans for documents with missing values.

  moduleId    module_name item_code
0   vjcluy  team 1 module      qn_3
1   yktdky   intro module       NaN
2   trxryd        hello_x       NaN

I need the same condition for all projected columns not just item_code parameter.

CodePudding user response:

If your query only consists of a query and a projection, use .find(), the systax is simpler than .aggregate().

To only include records where a field exists, use the $exists filter operator. https://docs.mongodb.com/manual/reference/operator/query/exists/

Your code simplifies down to:

def get_module_names(client, course_id):
    query = {'courseId': course_id, 'id': {'$exists': True}, 'name': {'$exists': True}}
    projection = {'_id': 0, 'moduleId': '$id', 'module_name': '$name', 'item_code': 1}
    data = client['table']['modules'].find(query, projection)
    return pd.DataFrame(list(data))

CodePudding user response:

It's a bit tedious, but for all the fields where you need null returned when it doesn't exist, you can use "$ifNull".

db.collection.aggregate([
  {
    "$project": {
      "_id": 0,
      "courseId": {
        "$ifNull": [ "$courseId", null ]
      },
      "moduleId": {
        "$ifNull": [ "$moduleId", null ]
      },
      "module_name": {
        "$ifNull": [ "$module_name", null ]
      },
      "item_code": {
        "$ifNull": [ "$item_code", null ]
      }
    }
  }
])

Try it on mongoplayground.net.

  • Related