Home > Blockchain >  Aggregation function for Counting of Duplicates in a field based on duplicate items in another field
Aggregation function for Counting of Duplicates in a field based on duplicate items in another field

Time:03-29

I am using mongoengine as ORM with flask application. The model class is define like

class MyData(db.Document):
    task_id = db.StringField(max_length=50, required=True)
    url = db.URLField(max_length=500,required=True,unique=True)
    organization = db.StringField(max_length=250,required=True)
    val = db.StringField(max_length=50, required=True)

The field organization can be repeating and I want to get the count of duplicates with respect to values in another field. For example if the data in mongodb is like

[{"task_id":"as4d2rds5","url":"https:example1.com","organization":"Avengers","val":"null"},
 {"task_id":"rfre43fed","url":"https:example2.com","organization":"Avengers","val":"valid"},
 {"task_id":"uyje3dsxs","url":"https:example3.com","organization":"Metro","val":"valid"},
 {"task_id":"ghs563vt6","url":"https:example4.com","organization":"Avengers","val":"invalid"},
 {"task_id":"erf6egy64","url":"https:example5.com","organization":"Metro","val":"null"}]

Then I am querying all the objects using

data = MyData.objects()

I want a response like

[{"Avengers":{"valid":1,"null":1,"invalid":1}},{"Metro":{"valid":1,"null":1,"invalid":0}}]

I tried like

db.collection.aggregate([
  {
    "$group": {
      "_id": "$organization",
      "count": [
        {
          "null": {
            "$sum": 1
          },
          "valid": {
            "$sum": 1
          },
          "invalid": {
            "$sum": 1
          }
        }
      ]
    }
  }
])

but I am getting an error

The field 'count' must be an accumulator object

CodePudding user response:

Maybe something like this:

db.collection.aggregate([
{
  "$group": {
  "_id": {
    k: "$organization",
    v: "$val"
   },
  "cnt": {
    $sum: 1
   }
  }
 },
 {
  $project: {
    _id: 0,
    k: "$_id.k",
    o: {
      k: "$_id.v",
      v: "$cnt"
    }
   }
  },
 {
   $group: {
    _id: "$k",
    v: {
      $push: "$o"
    }
  }
},
{
  $addFields: {
    v: {
      "$arrayToObject": "$v"
    }
  }
},
{
  $project: {
    _id: 0,
    new: [
      {
        k: "$_id",
        v: "$v"
      }
    ]
  }
},
{
  "$addFields": {
    "new": {
      "$arrayToObject": "$new"
    }
  }
},
{
 "$replaceRoot": {
   "newRoot": "$new"
 }
}
])

Explained:

  1. Group to count
  2. Project for arrayToObject
  3. Group to join the values
  4. arrayToObject one more time
  5. project additionally
  6. arrayToObject to form the final object
  7. project one more time
  8. replaceRoot to move the object to root.

P.S. Please, note this solution is not showing the missing values if they do not exist , if you need the missing values additional mapping / mergeObjects need to be added

playground1

Option with missing values ( if possible values are fixed to null,valid,invalid) : just replace the second addFiedlds with:

   {
   $addFields: {
     v: {
    "$mergeObjects": [
      {
        "null": 0,
        valid: 0,
        invalid: 0
      },
      {
        "$arrayToObject": "$v"
      }
    ]
   }
  }
 }

playground2

  • Related