Home > Back-end >  MongoDB Sum the Frequency that a Field Value Exists with the Value of Another Field
MongoDB Sum the Frequency that a Field Value Exists with the Value of Another Field

Time:11-26

I have a database that looks like:

spider_results = [
    
{
    "_id": {
        "$oid": "345g435f543f"
    },
    "url": "localhost:8080/path1/",
    "proxy": {
        "proxy": "196.0.1.1:1500",
        "protocol": "http",
    },
    "success": false,
},

{
    "_id": {
        "$oid": "14544gtf543f"
    },
    "url": "localhost:8080/path2/",
    "proxy": {
        "proxy": "196.0.1.1:1500",
        "protocol": "http",
    },
    "success": true,
},

{
    "_id": {
        "$oid": "t4rw4rww4r"
    },
    "url": "localhost:8080/path3/",
    "proxy": {
        "proxy": "192.168.255.255:1500",
        "protocol": "http",
    },
    "success": true,
},

{
    "_id": {
        "$oid": "ferfrw4456"
    },
    "url": "localhost:8080/path4/",
    "proxy": {
        "proxy": "192.168.255.255:1500",
        "protocol": "http",
    },
    "success": true,
},

]

My goal is to count the number of times that each proxy was/wasn't "success." The result would look like:

196.0.1.1:1500         --> success:true  = 1
196.0.1.1:1500         --> success:false = 1
192.168.255.255:1500   --> success:true  = 2
192.168.255.255:1500   --> success:false = 0

I've been running individual queries like

{"proxy.proxy": "192.168.255.255:1500", success: true}

{"proxy.proxy": "192.168.255.255:1500", success: false}

But as the number of proxies is increasing, this is obviously not sufficient. There is a very good chance I'm not Googling the correct search query for my question, because I've yet to find anyone who is attempting the same type of search.

CodePudding user response:

Query

  • we need to group by proxy and count 2 fields true-success, and false-success that we don't have, but we can create them based on success field that we already have.
    • success : true becomes {"true-count": 1, "false-count": 0}
    • success : false becomes {"true-count": 0, "false-count": 1}
      And now we have those 2 fields that we need in each document.
  • set to add this extra fields
  • group and count the true and the false success
  • make the 2 fields an array and unwind (to separate the fields in different documents)
  • project to fix the structure

Test code here

aggregate(
[{"$set": 
    {"success": 
      {"$cond": 
        ["$success",
          {"true-count": 1, "false-count": 0},
          {"true-count": 0, "false-count": 1}]}}},
  {"$group": 
    {"_id": "$proxy.proxy",
      "true-count": {"$sum": "$success.true-count"},
      "false-count": {"$sum": "$success.false-count"}}},
  {"$set": 
    {"success": 
      [{"count": "$true-count", "success": true},
       {"count": "$false-count", "success": false}]}},
  {"$unwind": {"path": "$success"}},
  {"$project": 
    {"_id": 0,
      "proxy": "$_id",
      "count": "$success.count",
      "success": "$success.success"}}])

CodePudding user response:

You can try it this way:

db.collection.aggregate([
  {
    $group: {
      _id: {
        proxy: "$proxy.proxy",
        success: "$success"
      },
      count: {
        $sum: 1
      }
    }
  }
])

This is the result for this query:

[
  {
    "_id": {
      "proxy": "196.0.1.1:1500",
      "success": true
    },
    "count": 1
  },
  {
    "_id": {
      "proxy": "196.0.1.1:1500",
      "success": false
    },
    "count": 1
  },
  {
    "_id": {
      "proxy": "192.168.255.255:1500",
      "success": true
    },
    "count": 2
  }
]

PS: This counts number of success of each proxy that's inside your db, and output nothing if there's no success value for that proxy; for example this case won't be output 192.168.255.255:1500 --> success:false = 0

You can try it out here

  • Related