Let us say for example I have a collection which contains car sales information for a manufacturer worldwide.
Timeline Country Sales
2021-W01 A 10
2021-W02 B 20
2021-W03 C 30
…
2022-W33 Z 50
Now I would like the aggregation to compute total sales for every 10 weeks between week 1 2021 (including) and week 45 2021 (excluding).
Desired Output:
{
"result": [
{
"start": "2021-W01",
"end": "2021-W10",
"totalSales": 100
},
{
"start": "2021-W10",
"end": "2021-W20",
"totalSales": 20
},
…
{
"start": "2021-W40",
"end": "2021-W45",
"totalSales": 1
}
]
}
For this so far, I have come up with this solution.
db.collection.aggregate([
{"$match": {"$and":[{"Country": "A"}, {"Timeline": {"$in": [‘2021-W01’, ‘2021-W11’, … ‘2021-W45’]}}]}},
{"$group": {"_id": {Timeline: "$Timeline", totalSales: {"$sum": "$Sales"}}}},
{"$project": {"_id": 0, result: "$_id"}}
])
But this is producing output like this
[
{
"result": {
"Timeline": "2021-W01",
"totalSales": 10
}
},
{
"result": {
"Timeline": "2021-W02",
"totalSales": 20
}
},
…
{
"result": {
"Timeline": "2021-W45",
"totalSales": 23
}
}
]
I am unable to get aggregated results for every 10 weeks as this is only doing it for every week. If possible, I kindly request everyone to help me understand this. Thanks.
Additional Information: (Requested by @WernfriedDomscheit)
Is the timeline according to ISO-8601? (Note, actual year might be different to ISO-Week)
Yes, it is. Only weeks are allowed(For example "2022-W34"). (yyyy-Www format)
Sample Input
{
"records": [{
"Timeline": "2021-W01",
"Sales": 10,
"Country": "A"
}, {
"Timeline": "2021-W02",
"Sales": 11,
"Country": "A"
}, {
"Timeline": "2021-W03",
"Sales": 12,
"Country": "A"
}, {
"Timeline": "2021-W04",
"Sales": 13,
"Country": "A"
}]
}
- Expected Output
{
"result": [{
"start": "2021-W01",
"end": "2021-W03",
"totalSales": 21
}, {
"start": "2021-W03",
"end": "2021-W04",
"totalSales": 12
}]
}
Explanation for desired output:
Let's assume we want the aggregate the total sales for 2 weeks between 2021-W01(including) and 2021-W04 (excluding).
- Add week1 sales and week2 sales => 10 11 => 21 ( startweek: 2021-W01 (including) and endweek 2021-W03 (excluding) ).
- Take week3 sales only => 12 ( startweek: 2021-W03 (including) and endweek 2021-W04 (excluding) ).
Where does 2021-W01 and 2021-W45 come from? Static input values or based on collection data. Why 1-10, 10-20, 20-30, 30-40, 40-45 (and not 40-50) ?.
It is a collection data. For the last result, it is 40-45 and not 40-50 because I mentioned that we need to compute total sales for every 10 weeks between week 1 2021 (including) and week 45 2021 (excluding). So even if the collection has data till 2022-W04 we stop at week 45 2021 (excluding). This means the last result will only aggregate for the 2021-W40 - 2021-W44 range.
Do you have always at least one entry per week?
No. Certain countries may have missed feeding the data for a week.
CodePudding user response:
Why do you like to exclude 2021-W04? It is not consistent with the previous intervals.
Try this one. First convert string 2021-W04
into a Date
. Then convert it to a number, I would suggest YYYYWW
format. Then you can define the intervals and group accordingly:
db.collection.aggregate([
{
$set: {
Time: {
$toInt: {
$dateToString: {
date: { $dateFromString: { dateString: "$Timeline", format: "%G-W%V" } },
format: "%G%V"
}
}
}
}
},
{
$bucket: {
groupBy: "$Time",
boundaries: [ 202101, 202103, 202105 ],
default: "not found",
output: {
totalSales: { $sum: "$Sales" },
start: { $min: "$Time" },
end: { $max: "$Time" }
}
}
}
])
You may need some final cosmetic, check $dateAdd, $dateToString, $dateFromString