So I got my input data, which looks like this (example excerpt of one document, truncated - there are a lot more attributes):
{'_id': ObjectId('62bab00ea4b4ebd48908457d'),
'id': 'xxx1234',
'masterVariant':
attributes': [
{'name': 'propertyA', 'value': 'CJS'},
{'name': 'propertyB', 'value': 300},
{'name': 'propertyC', 'value': 221},
{'name': 'propertyE', 'value': 'dasdags'}
]
}
What I'm trying to do is to create a pandas DataFrame out of this data with following operation (using aggregate_pandas_all(), ignore that I'm only using aggregate() here).
list(coll.aggregate([
{'$unwind': '$masterVariant.attributes'},
{'$project': {
'id': '$id',
'propertyA': {
'$cond': [{'$eq': ['$masterVariant.attributes.name', 'propertyA']}, '$masterVariant.attributes.value', 'None']
},
'propertyE': {
'$cond': [{'$eq': ['$masterVariant.attributes.name', 'propertyE']}, '$masterVariant.attributes.value', 'None']
},
}},
{ '$group': {
'_id': '$_id',
'id': {'$first': '$id'},
'propertyA': {'$first': '$propertyA'},
'propertyE': {'$first': '$propertyE'},
}
}
]
))
I'm trying to extract several of the attributes and create a tuple/row out of it, basically in this case the result should look like this:
id | propertyA | propertyE |
---|---|---|
xxx1234 | 'CJS' | 'dasdags' |
Issue I'm getting is rows full of 'None' entries. This is due to the fact that unwinding creates a lot of documents for every property/attribute there is and I can't properly use the $first (or $last) aggregation operator. As far as I understand the $unwind operation it would yield something like this (without $group) - correct me if I'm wrong:
| id | propertyA | propertyE |
| -- | --------- | ----------|
| xxx1234| 'CJS' | 'None' |
| xxx1234| 'None' | 'dasdags' |
Anyone know if what I want to achieve is actually possible (easily?)? Hope I somehow formulated my question clearly.
CodePudding user response:
The issue is you are getting null
because of $first
. It will only take the first element of the array.
To resolve this in the $group
stage you can use $push
with $cond
this will ignore null
values.
Then add one more $project
stage to get the final output
Complete Aggregation query
db.collection.aggregate([
{
"$unwind": "$masterVariant.attributes"
},
{
"$project": {
"id": "$id",
"propertyA": {
"$cond": [
{
"$eq": [
"$masterVariant.attributes.name",
"propertyA"
]
},
"$masterVariant.attributes.value",
null
]
},
"propertyE": {
"$cond": [
{
"$eq": [
"$masterVariant.attributes.name",
"propertyE"
]
},
"$masterVariant.attributes.value",
null
]
},
}
},
{
"$group": {
"_id": "$_id",
"id": {
"$first": "$id"
},
"propertyA": {
"$push": {
"$cond": [
{
"$ne": [
"$propertyA",
null
]
},
"$propertyA",
"$$REMOVE"
]
}
},
"propertyE": {
"$push": {
"$cond": [
{
"$ne": [
"$propertyE",
null
]
},
"$propertyE",
"$$REMOVE"
]
}
},
}
},
{
"$project": {
"id": "$id",
"propertyA": {
"$first": "$propertyA"
},
"propertyE": {
"$first": "$propertyE"
},
}
}
])