Home > OS >  Fill month gaps based on multiple columns and aggregate data
Fill month gaps based on multiple columns and aggregate data

Time:06-08

I need to aggregate data on a monthly basis and also fill the gaps of months for the whole year with 0 Amounts. The problem is that the aggregation must happen on multiple columns and this makes it tricky (i.e. using date table and left join does not work for me as other data in dataset has matches within different periods that this dataset is missing)

The dataset I have is:

Period Account CompanyType Amount
20220101 11111 Internal 100
20220201 11111 Internal 200
20220601 11111 Internal 300
20221001 11111 Internal 800
20221001 11111 External 100
20221001 11111 External 300
20221201 11111 Internal 100
20220101 22222 External 20
20220601 22222 External 50
20221201 22222 External 30

What i'm looking for is:

Period Account Type Amount
20220101 11111 Internal 100
20220201 11111 Internal 200
20220301 11111 Internal 0
20220401 11111 Internal 0
20220501 11111 Internal 0
20220601 11111 Internal 300
20220701 11111 Internal 0
20220801 11111 Internal 0
20220901 11111 Internal 0
20221001 11111 Internal 800
20221101 11111 Internal 0
20221201 11111 Internal 100
20220101 11111 External 0
20220201 11111 External 0
20220301 11111 External 0
20220401 11111 External 0
20220501 11111 External 0
20220601 11111 External 0
20220701 11111 External 0
20220801 11111 External 0
20220901 11111 External 0
20221001 11111 External 400
20221101 11111 External 0
20221201 11111 External 0
20220101 22222 Internal 20
20220201 22222 Internal 0
20220301 22222 Internal 0
20220401 22222 Internal 0
20220501 22222 Internal 0
20220601 22222 External 50
20220701 22222 External 0
20220801 22222 External 0
20220901 22222 External 0
20221001 22222 External 0
20221101 22222 External 0
20221201 22222 External 30

Lines that are bolded should be created in the result set.

I have tried multiple solutions but just cannot seem to get it to work due to the reason that the dataset has multiple columns and should be aggregated based on all of them (except Amount).

Update: I oversimplified the problem in my example. The sample dataset with all the fields that i am looking into is provided here: Dbfiddle I do have Dates (Calendar) table present and can use it by the way. Note that CompanyType field is tied to Company field and is not relevant in this case.

The result that i am looking for is following: For every combination of ActiveMark, Account, Company(CompanyType), Currency there must be 12 records in total (from January to December). Missing periods should have Amount set as 0

CodePudding user response:

First things first, if you don't have a Calendar Table, invest in one; this'll make things far easier. I assume, in the following, that you both have one (or you're creating one) and it has a column with the day of the month in it (CalendarDay in this case).

With a Calendar, you can then get all the dates you need and then CROSS JOIN it to the other values you need. I assume you also have an Accounts table (as why wouldn't you) and that for the Type it's limited to 'Internal' and External' and so use a VALUES table construct.

This gives you the following:

WITH Dates AS(
    SELECT CT.CalendarDate
    FROM dbo.CalendarTable CT
    WHERE CT.CalendarDate >= '20220101'
      AND CT.CalendarDate < '20230101'
      AND CT.CalendarDate = 1),
AccountDates AS(
    SELECT D.CalendarDate,
           A.Account,
           V.CompanyType
    FROM Dates D
         CROSS JOIN dbo.Accounts A
         CROSS JOIN (VALUES('Internal'),('External'))V(CompanyType))
SELECT AD.CalendarDate AS Period,
       AD.Account,
       AD.CompanyType,
       ISNULL(YT.Amount,0)
FROM AccountsDates AD
     LEFT JOIN dbo.YourTable YT ON AD.CalendarDate = YT.Period
                               AND AD.Account = YT.Account
                               AND AD.CompanyType = YT.CompanyType;

CodePudding user response:

Suppose you have what you've shown to us - one table with data as is.

To get desired result you can use such kind of query:

WITH months(month) AS (
    SELECT * FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12)) AS m(m)
),
accounts(account) AS (
    SELECT DISTINCT account FROM test
),
companyTypes(companyType) AS (
    SELECT DISTINCT companyType FROM test
),
periods(period) AS (
    SELECT DATEFROMPARTS(2022, month, 01) FROM months WHERE NOT EXISTS (
        SELECT 1 FROM test WHERE MONTH(period) = month
    ) UNION ALL (SELECT DISTINCT period FROM Test)
)
SELECT p.period, a.account, ct.companyType, SUM(COALESCE(t.amount, 0)) 
FROM periods p
CROSS JOIN accounts as a
CROSS JOIN companyTypes as ct
LEFT JOIN test as t ON t.period = p.period AND t.account = a.account AND t.companyType = ct.companyType
GROUP BY p.period, a.account, ct.companyType
ORDER BY a.account, ct.companyType, p.period

Please, check working demo

  • Related