Home > other >  Query document count by multiple ranges returning range start/end with matching element count
Query document count by multiple ranges returning range start/end with matching element count

Time:04-12

I've been trying to create a query on documents like these:

[
    {
        "timestamp": new ISODate('2020-01-01T00:00:00'),
        "objectId": "Id_A",
        "locationId": "Location_A"
    },
    {
        "timestamp": new ISODate('2021-01-01T00:00:00'),
        "objectId": "Id_A",
        "locationId": "Location_A"
    },
    {
        "timestamp": new ISODate('2022-01-01T00:00:00'),
        "objectId": "Id_A",
        "locationId": "Location_B"
    },
    {
        "timestamp": new ISODate('2021-01-01T00:00:00'),
        "objectId": "Id_B",
        "locationId": "Location_B"
    },
    {
        "timestamp": new ISODate('2022-01-01T00:00:00'),
        "objectId": "Id_A",
        "locationId": "Location_A"
    }
]

given multiple "and" queries, I want to count the matching documents per range

[$or: [
    { $and: [{
        "timestamp": {$gte: new ISODate('2020-01-01T00:00:00'),
                      $lt: new ISODate('2020-12-31T00:00:00'),
        },
        "objectId": "Id_A",
        "locationId": "Location_A"}]},
    { $and: [{
        "timestamp": {$gte: new ISODate('2020-01-01T00:00:00'),
                      $lt: new ISODate('2022-12-31T00:00:00'),
        },
        "objectId": "Id_A",
        "locationId": "Location_A"}]},
    { $and: [{
        "timestamp": {$gte: new ISODate('2022-01-01T00:00:00'),
                      $lt: new ISODate('2022-12-31T00:00:00'),
        },
        "objectId": "Id_A",
        "locationId": "Location_B"}]}
    ]
]

I want to map the counts to a result structure that looks like this

[
    {"objectId": "Id_A", "locationId": "Location_A", "rangeStart:": new ISODate('2020-01-01T00:00:00'), "rangeEnd": new ISODate('2020-12-31T00:00:00'), "count": 1},
    {"objectId": "Id_A", "locationId": "Location_A", "rangeStart:": new ISODate('2020-01-01T00:00:00'), "rangeEnd": new ISODate('2022-12-31T00:00:00'), "count": 3},
    {"objectId": "Id_A", "locationId": "Location_B", "rangeStart:": new ISODate('2022-01-01T00:00:00'), "rangeEnd": new ISODate('2022-12-31T00:00:00'), "count": 1},
]

Aggregations I looked at so far:

  • bucket
  • facet
  • group

But I still can't figure it out. What would your approach be?

CodePudding user response:

As hinted at in the comments, $facet will do the trick. Note the use of year-only constructors for ISODate for simplicity. The output does not match the target above exactly but since the bucket definitions are all hardcoded, it it trivial to add an $addfields stage after $count to put all the input criteria into the output doc.

db.foo.aggregate([
    {$facet: {
        "first_bucket": [
            {$match: {"objectId":"Id_A",
                      "locationId":"Location_A",
                      "timestamp": {$gte: new ISODate('2020-01-01'),
                                    $lt: new ISODate('2020-12-31')}
                     }},
            {$count: "N"}
        ],

        "second_bucket": [
            {$match: {"objectId":"Id_A",
                      "locationId":"Location_A",
                      "timestamp": {$gte: new ISODate('2020-01-01'),
                                    $lt: new ISODate('2022-12-31')}
                     }},
            {$count: "N"}
        ],

        "third_bucket": [
            {$match: {"objectId":"Id_A",
                      "locationId":"Location_B",
                      "timestamp": {$gte: new ISODate('2022-01-01'),
                                    $lt: new ISODate('2022-12-31')}
                     }},
            {$count: "N"}
        ]
    }}
]);

A slightly more interesting variation is to use $group in the $facet expression. This will produce more entries in the buckets but has the advantage of hardcoding only the date ranges.

db.foo.aggregate([
    {$facet: {
        "first_bucket": [
            {$match: {"timestamp": {$gte: new ISODate('2020-01-01'),
                                    $lt: new ISODate('2020-12-31')}
                     }},
            {$group: {_id: {objectId: "$objectId", locationId:"$locationId"},
                      N: {$sum:1}}}
        ],
        "second_bucket": [
            {$match: {"timestamp": {$gte: new ISODate('2020-01-01'),
                                    $lt: new ISODate('2022-12-31')}
                     }},
            {$group: {_id: {objectId: "$objectId", locationId:"$locationId"},
                      N: {$sum:1}}}
        ],
        "third_bucket": [
            {$match: {"timestamp": {$gte: new ISODate('2022-01-01'),
                                    $lt: new ISODate('2022-12-31')}
                     }},
            {$group: {_id: {objectId: "$objectId", locationId:"$locationId"},
                      N: {$sum:1}}}
        ]

    }}
]);

CodePudding user response:

Query

  • you can add one $match as first stage to keep only the valid ranges (this can use index also)
  • group on objectId and locationId and conditional range
  • i didn't tested the bellow query because i dont have sample data, if doesn't work and you are stuck, if you can add sample data and expected output

*$facet could be used but facet has those problems (test it to see what is better for your query)

  • doesn't use index (even if match is the first stage)
  • runs the pipeline multiple times, 1 time per field
aggregate(
[{"$group":
   {"_id":
     {"objectId":"$objectId",
      "locationId":"$locationId",
      "range":
       {"$switch":
         {"branches":
           [{"case":
               {"$and":
                 [{"$gte":["$timestamp", ISODate("2020-01-01T00:00:00Z")]},
                   {"$lt":["$timestamp", ISODate("2020-12-31T00:00:00Z")]}]},
              "then":
               {"rangeStart":ISODate("2020-01-01T00:00:00Z"),
                "rangeEnd":ISODate("2020-12-31T00:00:00Z")}},
             {"case":
               {"$and":
                 [{"$gte":["$timestamp", ISODate("2020-01-01T00:00:00Z")]},
                   {"$lt":["$timestamp", ISODate("2020-12-31T00:00:00Z")]}]},
              "then":
               {"rangeStart":ISODate("2020-01-01T00:00:00Z"),
                "rangeEnd":ISODate("2020-12-31T00:00:00Z")}},
             {"case":
               {"$and":
                 [{"$gte":["$timestamp", ISODate("2020-01-01T00:00:00Z")]},
                   {"$lt":["$timestamp", ISODate("2020-12-31T00:00:00Z")]}]},
              "then":
               {"rangeStart":ISODate("2020-01-01T00:00:00Z"),
                "rangeEnd":ISODate("2020-12-31T00:00:00Z")}}],
          "default":"out-of-range"}}},
    "count":{"$sum":1}}},
 {"$project":
   {"_id":0,
    "count":1,
    "objectId":"$_id.objectId",
    "locationId":"$_id.locationId",
    "range":"$_id.range"}}])
  • Related