I've been trying to create a query on documents like these:
[
{
"timestamp": new ISODate('2020-01-01T00:00:00'),
"objectId": "Id_A",
"locationId": "Location_A"
},
{
"timestamp": new ISODate('2021-01-01T00:00:00'),
"objectId": "Id_A",
"locationId": "Location_A"
},
{
"timestamp": new ISODate('2022-01-01T00:00:00'),
"objectId": "Id_A",
"locationId": "Location_B"
},
{
"timestamp": new ISODate('2021-01-01T00:00:00'),
"objectId": "Id_B",
"locationId": "Location_B"
},
{
"timestamp": new ISODate('2022-01-01T00:00:00'),
"objectId": "Id_A",
"locationId": "Location_A"
}
]
given multiple "and" queries, I want to count the matching documents per range
[$or: [
{ $and: [{
"timestamp": {$gte: new ISODate('2020-01-01T00:00:00'),
$lt: new ISODate('2020-12-31T00:00:00'),
},
"objectId": "Id_A",
"locationId": "Location_A"}]},
{ $and: [{
"timestamp": {$gte: new ISODate('2020-01-01T00:00:00'),
$lt: new ISODate('2022-12-31T00:00:00'),
},
"objectId": "Id_A",
"locationId": "Location_A"}]},
{ $and: [{
"timestamp": {$gte: new ISODate('2022-01-01T00:00:00'),
$lt: new ISODate('2022-12-31T00:00:00'),
},
"objectId": "Id_A",
"locationId": "Location_B"}]}
]
]
I want to map the counts to a result structure that looks like this
[
{"objectId": "Id_A", "locationId": "Location_A", "rangeStart:": new ISODate('2020-01-01T00:00:00'), "rangeEnd": new ISODate('2020-12-31T00:00:00'), "count": 1},
{"objectId": "Id_A", "locationId": "Location_A", "rangeStart:": new ISODate('2020-01-01T00:00:00'), "rangeEnd": new ISODate('2022-12-31T00:00:00'), "count": 3},
{"objectId": "Id_A", "locationId": "Location_B", "rangeStart:": new ISODate('2022-01-01T00:00:00'), "rangeEnd": new ISODate('2022-12-31T00:00:00'), "count": 1},
]
Aggregations I looked at so far:
- bucket
- facet
- group
But I still can't figure it out. What would your approach be?
CodePudding user response:
As hinted at in the comments, $facet
will do the trick. Note the use of year-only constructors for ISODate
for simplicity. The output does not match the target above exactly but since the bucket definitions are all hardcoded, it it trivial to add an $addfields
stage after $count
to put all the input criteria into the output doc.
db.foo.aggregate([
{$facet: {
"first_bucket": [
{$match: {"objectId":"Id_A",
"locationId":"Location_A",
"timestamp": {$gte: new ISODate('2020-01-01'),
$lt: new ISODate('2020-12-31')}
}},
{$count: "N"}
],
"second_bucket": [
{$match: {"objectId":"Id_A",
"locationId":"Location_A",
"timestamp": {$gte: new ISODate('2020-01-01'),
$lt: new ISODate('2022-12-31')}
}},
{$count: "N"}
],
"third_bucket": [
{$match: {"objectId":"Id_A",
"locationId":"Location_B",
"timestamp": {$gte: new ISODate('2022-01-01'),
$lt: new ISODate('2022-12-31')}
}},
{$count: "N"}
]
}}
]);
A slightly more interesting variation is to use $group
in the $facet
expression. This will produce more entries in the buckets but has the advantage of hardcoding only the date ranges.
db.foo.aggregate([
{$facet: {
"first_bucket": [
{$match: {"timestamp": {$gte: new ISODate('2020-01-01'),
$lt: new ISODate('2020-12-31')}
}},
{$group: {_id: {objectId: "$objectId", locationId:"$locationId"},
N: {$sum:1}}}
],
"second_bucket": [
{$match: {"timestamp": {$gte: new ISODate('2020-01-01'),
$lt: new ISODate('2022-12-31')}
}},
{$group: {_id: {objectId: "$objectId", locationId:"$locationId"},
N: {$sum:1}}}
],
"third_bucket": [
{$match: {"timestamp": {$gte: new ISODate('2022-01-01'),
$lt: new ISODate('2022-12-31')}
}},
{$group: {_id: {objectId: "$objectId", locationId:"$locationId"},
N: {$sum:1}}}
]
}}
]);
CodePudding user response:
Query
- you can add one
$match
as first stage to keep only the valid ranges (this can use index also) - group on objectId and locationId and conditional range
- i didn't tested the bellow query because i dont have sample data, if doesn't work and you are stuck, if you can add sample data and expected output
*$facet
could be used but facet has those problems (test it to see what is better for your query)
- doesn't use index (even if match is the first stage)
- runs the pipeline multiple times, 1 time per field
aggregate(
[{"$group":
{"_id":
{"objectId":"$objectId",
"locationId":"$locationId",
"range":
{"$switch":
{"branches":
[{"case":
{"$and":
[{"$gte":["$timestamp", ISODate("2020-01-01T00:00:00Z")]},
{"$lt":["$timestamp", ISODate("2020-12-31T00:00:00Z")]}]},
"then":
{"rangeStart":ISODate("2020-01-01T00:00:00Z"),
"rangeEnd":ISODate("2020-12-31T00:00:00Z")}},
{"case":
{"$and":
[{"$gte":["$timestamp", ISODate("2020-01-01T00:00:00Z")]},
{"$lt":["$timestamp", ISODate("2020-12-31T00:00:00Z")]}]},
"then":
{"rangeStart":ISODate("2020-01-01T00:00:00Z"),
"rangeEnd":ISODate("2020-12-31T00:00:00Z")}},
{"case":
{"$and":
[{"$gte":["$timestamp", ISODate("2020-01-01T00:00:00Z")]},
{"$lt":["$timestamp", ISODate("2020-12-31T00:00:00Z")]}]},
"then":
{"rangeStart":ISODate("2020-01-01T00:00:00Z"),
"rangeEnd":ISODate("2020-12-31T00:00:00Z")}}],
"default":"out-of-range"}}},
"count":{"$sum":1}}},
{"$project":
{"_id":0,
"count":1,
"objectId":"$_id.objectId",
"locationId":"$_id.locationId",
"range":"$_id.range"}}])