Home > database >  MongoDb summing number of distinct elements in string field array
MongoDb summing number of distinct elements in string field array

Time:12-14

I have an ip address collection:

{  
 "_id" : "uezyuLx4jjfvcqN",   
  "CVE" : ["CVE2020-123", "CVE2022-789", "CVE2019-456"],   
  "ip" : "1.2.3.4"   
}
{  
  "_id" : "dCC8GrNdEjym3ryua",   
  "CVE" : ["CVE2020-123", "CVE2021-469"],  
  "ip" : "5.6.7.8"  
}
{  
  "_id" : "dCC8GrNdEjym3ryua",   
  "CVE" : ["CVE2020-123", "CVE2021-469"],  
  "ip" : "7.6.7.6"  
}

I'm trying to calculate the distinct sum of the CVE field, where IPs are in ["5.6.7.8", "1.2.3.4"].

Expected output:

{
 ip: ['1.2.3.4', '5.6.7.8'], 
 sum_distinct_cve:4, 
 CVES: ["CVE2020-123", "CVE2022-789", "CVE2019-456", "CVE2021-469"]
}

So I'm doing the following:

db = db.getSiblingDB("test");
hosts = db.getCollection("test-collection")
hosts.aggregate([
  {$match:
    {"ip": {$in: ["1.2.3.4", "5.6.7.8"]}}},
  {$group: 
    {_id: "$CVE",
    totals: {$sum: "$CVE"}}}
]);

The sum is returning 0, which I've realised is because of MongoDb's behaviour when trying to sum a string field. This is detailed here: mongodb sum query returning zero

What I would like to know though is how I can sum the number of elements, and also find the distinct sum.`

CodePudding user response:

Simple option:

db.collection.aggregate([
{
 $match: {
  "ip": {
    $in: [
      "1.2.3.4",
      "5.6.7.8"
    ]
   }
  }
},
{
  $unwind: "$CVE"
},
{
 $group: {
  _id: "",
  ip: {
    $addToSet: "$ip"
  },
  CVE: {
    $addToSet: "$CVE"
   }
  }
  },
 {
$project: {
  _id: 0,
  ip: 1,
  CVE: 1,
  sum_distinct_cve: {
    $size: "$CVE"
   }
  }
 }
])

Explained:

  1. Match the ip's
  2. unwind the CVE arrays
  3. group so you can join ip and CVE distinct values only
  4. Project the necessary fields and use $size to count the distinct CVE's

Playground

CodePudding user response:

I agree with @R2D2. One more option to avoid $unwind (which considered costly in terms of performance) is to use $reduce instead:

db.collection.aggregate([
  {$match: {ip: {$in: ["1.2.3.4", "5.6.7.8"]}}},
  {$group: {
      _id: 0,
      ip: {$addToSet: "$ip"},
      CVE: {$addToSet: "$CVE"}
  }},
  {$project: {
      _id: 0, ip: 1,
      CVE: {
        $reduce: {
          input: "$CVE",
          initialValue: [],
          in: {$setUnion: ["$$value", "$$this"]}
        }
      }
  }},
  {$set: {sum_distinct_cve: {$size: "$CVE"}}}
])

See how it works on the playground example

  • Related