Home > Back-end >  Aggregate query to get all different condition at single query in mongodb
Aggregate query to get all different condition at single query in mongodb

Time:10-27

A document in my mongodb collection is as follows:

{
  "transaction_id": "e10cc8d64204394cd35329a88dc4ab8f",
  "timestamp": "2021-05-11 17:47:50",
  "client_id": "[email protected]",
  "ip": "172.69.34.31",
  "level": "OK",
  "request": {
    "method": "POST",
    "status_code": 200,
    "status_name": "Ok",
    "url": "/ocr/driver"
  },
  "bytes": 297,
  "message": "Status authenticated successfully"
},
{
  "transaction_id": "e10cc8d64204394cd35329a88dc4ab8f",
  "timestamp": "2021-05-11 17:47:50",
  "client_id": "[email protected]",
  "ip": "172.69.34.31",
  "level": "OK",
  "request": {
    "method": "POST",
    "status_code": 200,
    "status_name": "Ok",
    "url": "/status/driver"
  },
  "bytes": 297,
  "message": "Status authenticated successfully"
}......

I can get the total of each request.url by timestamp(daily-based) with the following query.

db.getCollection('collection').aggregate([ 
    {
       $match: { 
            client_id: { $regex : '^chae@'}, 
           'request.url': { $regex : '^\/ocr'}}
    },
    { '$group' :
        {
            '_id': { $substr: [ "$timestamp", 0, 10 ] },
            count: { $sum: 1 }
        }
    },
    {
        '$sort':
            { 'count': -1}
     }
])

then I get the result of request.url in which contains /^ocr/

_id        |  count
2021-10-01 |  10
2021-09-30 |  15
2021-09-29 |  11
....

It seems ok. However if I want to get a different request.url, I have to send another query. Can I do get the result of all different request.url at a time with one single query?

My desired outcome is something like this.

_id        |  count(ocr) | count(status) | count(something else)
2021-10-01 |  10         | 20            | 56 
2021-09-30 |  15         | 26            | 28 
2021-09-29 |  11         | 87            | 466 

I even don't know if it's possible in mongodb? Please advise me. Thanks.

CodePudding user response:

You can use $facet to get multiple results like this:

If you are using any programming language you can easily create the object for the query to not repeat code.

db.collection.aggregate({
  "$facet": {
    "ocr": [
      // your query here
    ],
    "status": [
      // your query using $regex : '^\/status'
    ],
    "whatever":[
      // your query using $regex : '^\/whatever'
    ]
  }
})

Example here

CodePudding user response:

Query

  • group accepts expression also, so we can calculate the group value
  • here we have 2 group keys
  • those that their email starts with chae and url with /ocr
  • those that their email starts with chae and url with /status
  • we sum those seperatly
  • if you add more cases on the $switch you can do this for many different searches
  • if you dont know those cases when you write the query, you need to contruct this {"$switch" ...} dynamically you can do it with a loop, its easy to do, loop on the variables and create it
  • if you have an index on those fields, you can add a $match on top, like allow only
    (or (chae and /ocr) (chae and /status) ...)

There is also a $facet solution (facet = 1 aggregation per field)

  • if index is used on the $match , $facet or $group similar perfomance (you do regex match so test the index use), but $facet simpler query
  • if not index, $facet will do 1 collection scan per field, so go for the $group i think

Test code here

aggregate(
[{"$group": 
   {"_id": 
     {"reg": 
       {"$switch": 
         {"branches": 
           [{"case": 
              {"$and": 
                [{"$regexMatch": {"input": "$client_id", "regex": "^chae@"}},
                 {"$regexMatch": {"input": "$request.url", "regex": "^/ocr"}}]},
                "then": "chae-ocr"},
           {"case": 
             {"$and": 
               [{"$regexMatch": {"input": "$client_id", "regex": "^chae@"}},
                {"$regexMatch": 
                  {"input": "$request.url", "regex": "^/status"}}]},
                "then": "chae-status"}],
            "default": "other"}},
        "time": {"$substrCP": ["$timestamp", 0, 10]}},
      "count": {"$sum": 1}}},
  {"$project": {"_id": "$_id.time", "count": 1, "searched": "$_id.reg"}}])
  • Related