Home > Back-end >  PyMongo/MongoDB - Grouping with many null values after unwinding - possible?
PyMongo/MongoDB - Grouping with many null values after unwinding - possible?

Time:07-20

So I got my input data, which looks like this (example excerpt of one document, truncated - there are a lot more attributes):

{'_id': ObjectId('62bab00ea4b4ebd48908457d'),
 'id': 'xxx1234',
 'masterVariant':
  attributes': [
   {'name': 'propertyA', 'value': 'CJS'},
   {'name': 'propertyB', 'value': 300},
   {'name': 'propertyC', 'value': 221},
   {'name': 'propertyE', 'value': 'dasdags'}
  ]
 }

What I'm trying to do is to create a pandas DataFrame out of this data with following operation (using aggregate_pandas_all(), ignore that I'm only using aggregate() here).

list(coll.aggregate([
    {'$unwind': '$masterVariant.attributes'},
    
    {'$project': {
        'id': '$id',
        'propertyA': {
            '$cond': [{'$eq': ['$masterVariant.attributes.name', 'propertyA']}, '$masterVariant.attributes.value', 'None']
        },
        'propertyE': {
            '$cond': [{'$eq': ['$masterVariant.attributes.name', 'propertyE']}, '$masterVariant.attributes.value', 'None']
        },  
    }},
    { '$group': {
        '_id': '$_id',
        'id': {'$first': '$id'},
        'propertyA': {'$first': '$propertyA'},
        'propertyE': {'$first': '$propertyE'},
    }
    }
]

))

I'm trying to extract several of the attributes and create a tuple/row out of it, basically in this case the result should look like this:

id propertyA propertyE
xxx1234 'CJS' 'dasdags'

Issue I'm getting is rows full of 'None' entries. This is due to the fact that unwinding creates a lot of documents for every property/attribute there is and I can't properly use the $first (or $last) aggregation operator. As far as I understand the $unwind operation it would yield something like this (without $group) - correct me if I'm wrong:

| id | propertyA | propertyE |
| -- | --------- | ----------|
| xxx1234| 'CJS'     | 'None'    |
| xxx1234| 'None'    | 'dasdags' |

Anyone know if what I want to achieve is actually possible (easily?)? Hope I somehow formulated my question clearly.

CodePudding user response:

The issue is you are getting null because of $first. It will only take the first element of the array. To resolve this in the $group stage you can use $push with $cond this will ignore null values.

Then add one more $project stage to get the final output

Complete Aggregation query

db.collection.aggregate([
  {
    "$unwind": "$masterVariant.attributes"
  },
  {
    "$project": {
      "id": "$id",
      "propertyA": {
        "$cond": [
          {
            "$eq": [
              "$masterVariant.attributes.name",
              "propertyA"
            ]
          },
          "$masterVariant.attributes.value",
          null
        ]
      },
      "propertyE": {
        "$cond": [
          {
            "$eq": [
              "$masterVariant.attributes.name",
              "propertyE"
            ]
          },
          "$masterVariant.attributes.value",
          null
        ]
      },
      
    }
  },
  {
    "$group": {
      "_id": "$_id",
      "id": {
        "$first": "$id"
      },
      "propertyA": {
        "$push": {
          "$cond": [
            {
              "$ne": [
                "$propertyA",
                null
              ]
            },
            "$propertyA",
            "$$REMOVE"
          ]
        }
      },
      "propertyE": {
        "$push": {
          "$cond": [
            {
              "$ne": [
                "$propertyE",
                null
              ]
            },
            "$propertyE",
            "$$REMOVE"
          ]
        }
      },
      
    }
  },
  {
    "$project": {
      "id": "$id",
      "propertyA": {
        "$first": "$propertyA"
      },
      "propertyE": {
        "$first": "$propertyE"
      },
      
    }
  }
])

Try in Mongodb Playground

  • Related