Home > database >  MongoDB Aggregation for every 10 weeks between week 1 2021 (including) and week 45 2021 (excluding)
MongoDB Aggregation for every 10 weeks between week 1 2021 (including) and week 45 2021 (excluding)

Time:08-29

Let us say for example I have a collection which contains car sales information for a manufacturer worldwide.

Timeline     Country    Sales
2021-W01       A          10
2021-W02       B          20
2021-W03       C          30
…
2022-W33       Z          50

Now I would like the aggregation to compute total sales for every 10 weeks between week 1 2021 (including) and week 45 2021 (excluding).

Desired Output:

{
    "result": [
        {
            "start": "2021-W01",
            "end": "2021-W10",
            "totalSales": 100
        },
        {
            "start": "2021-W10",
            "end": "2021-W20",
            "totalSales": 20
        },
        …
        {
            "start": "2021-W40",
            "end": "2021-W45",
            "totalSales": 1
        }
    ]
}

For this so far, I have come up with this solution.

db.collection.aggregate([
    {"$match": {"$and":[{"Country": "A"}, {"Timeline": {"$in": [‘2021-W01’, ‘2021-W11’, … ‘2021-W45’]}}]}},
    {"$group": {"_id": {Timeline: "$Timeline", totalSales: {"$sum": "$Sales"}}}},
    {"$project": {"_id": 0, result: "$_id"}}
])

But this is producing output like this

[
    {
        "result": {
            "Timeline": "2021-W01",
            "totalSales": 10
        }
    },
    {
        "result": {
            "Timeline": "2021-W02",
            "totalSales": 20
        }
    },
    …
    {
    "result": {
            "Timeline": "2021-W45",
            "totalSales": 23
        }
    }
]

I am unable to get aggregated results for every 10 weeks as this is only doing it for every week. If possible, I kindly request everyone to help me understand this. Thanks.

Additional Information: (Requested by @WernfriedDomscheit)

  1. Is the timeline according to ISO-8601? (Note, actual year might be different to ISO-Week)

    Yes, it is. Only weeks are allowed(For example "2022-W34"). (yyyy-Www format)

  2. Sample Input

    {
      "records": [{
          "Timeline": "2021-W01",
          "Sales": 10,
          "Country": "A"
       }, {
          "Timeline": "2021-W02",
          "Sales": 11,
          "Country": "A"
       }, {
          "Timeline": "2021-W03",
          "Sales": 12,
          "Country": "A"
       }, {
          "Timeline": "2021-W04",
          "Sales": 13,
          "Country": "A"
       }]
    }
  1. Expected Output
{
    "result": [{
        "start": "2021-W01",
        "end": "2021-W03",
        "totalSales": 21
    }, {
        "start": "2021-W03",
        "end": "2021-W04",
        "totalSales": 12
    }]
}

Explanation for desired output:

Let's assume we want the aggregate the total sales for 2 weeks between 2021-W01(including) and 2021-W04 (excluding).

  • Add week1 sales and week2 sales => 10 11 => 21 ( startweek: 2021-W01 (including) and endweek 2021-W03 (excluding) ).
  • Take week3 sales only => 12 ( startweek: 2021-W03 (including) and endweek 2021-W04 (excluding) ).
  1. Where does 2021-W01 and 2021-W45 come from? Static input values or based on collection data. Why 1-10, 10-20, 20-30, 30-40, 40-45 (and not 40-50) ?.

    It is a collection data. For the last result, it is 40-45 and not 40-50 because I mentioned that we need to compute total sales for every 10 weeks between week 1 2021 (including) and week 45 2021 (excluding). So even if the collection has data till 2022-W04 we stop at week 45 2021 (excluding). This means the last result will only aggregate for the 2021-W40 - 2021-W44 range.

  2. Do you have always at least one entry per week?

    No. Certain countries may have missed feeding the data for a week.

CodePudding user response:

Why do you like to exclude 2021-W04? It is not consistent with the previous intervals.

Try this one. First convert string 2021-W04 into a Date. Then convert it to a number, I would suggest YYYYWW format. Then you can define the intervals and group accordingly:

db.collection.aggregate([
  {
    $set: {
      Time: {
        $toInt: {
          $dateToString: {
            date: { $dateFromString: { dateString: "$Timeline", format: "%G-W%V" } },
            format: "%G%V"
          }
        }
      }
    }
  },
  {
    $bucket: {
      groupBy: "$Time",
      boundaries: [ 202101, 202103, 202105 ],
      default: "not found",
      output: {
        totalSales: { $sum: "$Sales" },
        start: { $min: "$Time" },
        end: { $max: "$Time" }            
      }
    }
  }
])

You may need some final cosmetic, check $dateAdd, $dateToString, $dateFromString

Mongo Playground

  • Related