I have the following function with Mongodb query.
def get_module_names(client, course_id):
data = list(client['table']['modules'].aggregate([
{
'$match': {
'courseId': course_id,
}
}, {
'$project': {
'_id': 0,
'moduleId': '$id',
'module_name': '$name',
'item_code': 1,
}
}
]))
return pd.DataFrame(data)
In the collection if the value item_code doesn't exist, The query returns a DataFrame with two columns moduleId and module_name without item_code. However I need the DataFrame to be returned only if all the projected columns exist in the final output. If however item_code value exists only for few objects then the query needs to return a DataFrame with item_code as a column, with Nans on rows with missing item_code value.
Currently , if item_code doesn't exist in the collection I am getting the following output.
moduleId module_name
0 vjcluy team 1 module
1 yktdky intro module
2 trxryd hello_x
But I need an empty DataFrame to be returned if item_code value doesn't exist at all in the collection.
If item code value does exists but only for few documents in the collection, then I need it returned with Nans for documents with missing values.
moduleId module_name item_code
0 vjcluy team 1 module qn_3
1 yktdky intro module NaN
2 trxryd hello_x NaN
I need the same condition for all projected columns not just item_code parameter.
CodePudding user response:
If your query only consists of a query and a projection, use .find()
, the systax is simpler than .aggregate()
.
To only include records where a field exists, use the $exists
filter operator. https://docs.mongodb.com/manual/reference/operator/query/exists/
Your code simplifies down to:
def get_module_names(client, course_id):
query = {'courseId': course_id, 'id': {'$exists': True}, 'name': {'$exists': True}}
projection = {'_id': 0, 'moduleId': '$id', 'module_name': '$name', 'item_code': 1}
data = client['table']['modules'].find(query, projection)
return pd.DataFrame(list(data))
CodePudding user response:
It's a bit tedious, but for all the fields where you need null
returned when it doesn't exist, you can use "$ifNull"
.
db.collection.aggregate([
{
"$project": {
"_id": 0,
"courseId": {
"$ifNull": [ "$courseId", null ]
},
"moduleId": {
"$ifNull": [ "$moduleId", null ]
},
"module_name": {
"$ifNull": [ "$module_name", null ]
},
"item_code": {
"$ifNull": [ "$item_code", null ]
}
}
}
])
Try it on mongoplayground.net.