I have a database that looks like:
spider_results = [
{
"_id": {
"$oid": "345g435f543f"
},
"url": "localhost:8080/path1/",
"proxy": {
"proxy": "196.0.1.1:1500",
"protocol": "http",
},
"success": false,
},
{
"_id": {
"$oid": "14544gtf543f"
},
"url": "localhost:8080/path2/",
"proxy": {
"proxy": "196.0.1.1:1500",
"protocol": "http",
},
"success": true,
},
{
"_id": {
"$oid": "t4rw4rww4r"
},
"url": "localhost:8080/path3/",
"proxy": {
"proxy": "192.168.255.255:1500",
"protocol": "http",
},
"success": true,
},
{
"_id": {
"$oid": "ferfrw4456"
},
"url": "localhost:8080/path4/",
"proxy": {
"proxy": "192.168.255.255:1500",
"protocol": "http",
},
"success": true,
},
]
My goal is to count the number of times that each proxy was/wasn't "success." The result would look like:
196.0.1.1:1500 --> success:true = 1
196.0.1.1:1500 --> success:false = 1
192.168.255.255:1500 --> success:true = 2
192.168.255.255:1500 --> success:false = 0
I've been running individual queries like
{"proxy.proxy": "192.168.255.255:1500", success: true}
{"proxy.proxy": "192.168.255.255:1500", success: false}
But as the number of proxies is increasing, this is obviously not sufficient. There is a very good chance I'm not Googling the correct search query for my question, because I've yet to find anyone who is attempting the same type of search.
CodePudding user response:
Query
- we need to group by proxy and count 2 fields
true-success
, andfalse-success
that we don't have, but we can create them based on success field that we already have.success : true
becomes {"true-count": 1, "false-count": 0}success : false
becomes {"true-count": 0, "false-count": 1}
And now we have those 2 fields that we need in each document.
set
to add this extra fieldsgroup
andcount
the true and the false success- make the 2 fields an array and unwind (to separate the fields in different documents)
project
to fix the structure
aggregate(
[{"$set":
{"success":
{"$cond":
["$success",
{"true-count": 1, "false-count": 0},
{"true-count": 0, "false-count": 1}]}}},
{"$group":
{"_id": "$proxy.proxy",
"true-count": {"$sum": "$success.true-count"},
"false-count": {"$sum": "$success.false-count"}}},
{"$set":
{"success":
[{"count": "$true-count", "success": true},
{"count": "$false-count", "success": false}]}},
{"$unwind": {"path": "$success"}},
{"$project":
{"_id": 0,
"proxy": "$_id",
"count": "$success.count",
"success": "$success.success"}}])
CodePudding user response:
You can try it this way:
db.collection.aggregate([
{
$group: {
_id: {
proxy: "$proxy.proxy",
success: "$success"
},
count: {
$sum: 1
}
}
}
])
This is the result for this query:
[
{
"_id": {
"proxy": "196.0.1.1:1500",
"success": true
},
"count": 1
},
{
"_id": {
"proxy": "196.0.1.1:1500",
"success": false
},
"count": 1
},
{
"_id": {
"proxy": "192.168.255.255:1500",
"success": true
},
"count": 2
}
]
PS: This counts number of success of each proxy that's inside your db, and output nothing if there's no success value for that proxy; for example this case won't be output 192.168.255.255:1500 --> success:false = 0
You can try it out here