I need to aggregate data on a monthly basis and also fill the gaps of months for the whole year with 0 Amounts. The problem is that the aggregation must happen on multiple columns and this makes it tricky (i.e. using date table and left join does not work for me as other data in dataset has matches within different periods that this dataset is missing)
The dataset I have is:
Period | Account | CompanyType | Amount |
---|---|---|---|
20220101 | 11111 | Internal | 100 |
20220201 | 11111 | Internal | 200 |
20220601 | 11111 | Internal | 300 |
20221001 | 11111 | Internal | 800 |
20221001 | 11111 | External | 100 |
20221001 | 11111 | External | 300 |
20221201 | 11111 | Internal | 100 |
20220101 | 22222 | External | 20 |
20220601 | 22222 | External | 50 |
20221201 | 22222 | External | 30 |
What i'm looking for is:
Period | Account | Type | Amount |
---|---|---|---|
20220101 | 11111 | Internal | 100 |
20220201 | 11111 | Internal | 200 |
20220301 | 11111 | Internal | 0 |
20220401 | 11111 | Internal | 0 |
20220501 | 11111 | Internal | 0 |
20220601 | 11111 | Internal | 300 |
20220701 | 11111 | Internal | 0 |
20220801 | 11111 | Internal | 0 |
20220901 | 11111 | Internal | 0 |
20221001 | 11111 | Internal | 800 |
20221101 | 11111 | Internal | 0 |
20221201 | 11111 | Internal | 100 |
20220101 | 11111 | External | 0 |
20220201 | 11111 | External | 0 |
20220301 | 11111 | External | 0 |
20220401 | 11111 | External | 0 |
20220501 | 11111 | External | 0 |
20220601 | 11111 | External | 0 |
20220701 | 11111 | External | 0 |
20220801 | 11111 | External | 0 |
20220901 | 11111 | External | 0 |
20221001 | 11111 | External | 400 |
20221101 | 11111 | External | 0 |
20221201 | 11111 | External | 0 |
20220101 | 22222 | Internal | 20 |
20220201 | 22222 | Internal | 0 |
20220301 | 22222 | Internal | 0 |
20220401 | 22222 | Internal | 0 |
20220501 | 22222 | Internal | 0 |
20220601 | 22222 | External | 50 |
20220701 | 22222 | External | 0 |
20220801 | 22222 | External | 0 |
20220901 | 22222 | External | 0 |
20221001 | 22222 | External | 0 |
20221101 | 22222 | External | 0 |
20221201 | 22222 | External | 30 |
Lines that are bolded should be created in the result set.
I have tried multiple solutions but just cannot seem to get it to work due to the reason that the dataset has multiple columns and should be aggregated based on all of them (except Amount).
Update: I oversimplified the problem in my example. The sample dataset with all the fields that i am looking into is provided here: Dbfiddle I do have Dates (Calendar) table present and can use it by the way. Note that CompanyType field is tied to Company field and is not relevant in this case.
The result that i am looking for is following: For every combination of ActiveMark, Account, Company(CompanyType), Currency there must be 12 records in total (from January to December). Missing periods should have Amount set as 0
CodePudding user response:
First things first, if you don't have a Calendar Table, invest in one; this'll make things far easier. I assume, in the following, that you both have one (or you're creating one) and it has a column with the day of the month in it (CalendarDay
in this case).
With a Calendar, you can then get all the dates you need and then CROSS JOIN
it to the other values you need. I assume you also have an Accounts
table (as why wouldn't you) and that for the Type
it's limited to 'Internal'
and External'
and so use a VALUES
table construct.
This gives you the following:
WITH Dates AS(
SELECT CT.CalendarDate
FROM dbo.CalendarTable CT
WHERE CT.CalendarDate >= '20220101'
AND CT.CalendarDate < '20230101'
AND CT.CalendarDate = 1),
AccountDates AS(
SELECT D.CalendarDate,
A.Account,
V.CompanyType
FROM Dates D
CROSS JOIN dbo.Accounts A
CROSS JOIN (VALUES('Internal'),('External'))V(CompanyType))
SELECT AD.CalendarDate AS Period,
AD.Account,
AD.CompanyType,
ISNULL(YT.Amount,0)
FROM AccountsDates AD
LEFT JOIN dbo.YourTable YT ON AD.CalendarDate = YT.Period
AND AD.Account = YT.Account
AND AD.CompanyType = YT.CompanyType;
CodePudding user response:
Suppose you have what you've shown to us - one table with data as is.
To get desired result you can use such kind of query:
WITH months(month) AS (
SELECT * FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12)) AS m(m)
),
accounts(account) AS (
SELECT DISTINCT account FROM test
),
companyTypes(companyType) AS (
SELECT DISTINCT companyType FROM test
),
periods(period) AS (
SELECT DATEFROMPARTS(2022, month, 01) FROM months WHERE NOT EXISTS (
SELECT 1 FROM test WHERE MONTH(period) = month
) UNION ALL (SELECT DISTINCT period FROM Test)
)
SELECT p.period, a.account, ct.companyType, SUM(COALESCE(t.amount, 0))
FROM periods p
CROSS JOIN accounts as a
CROSS JOIN companyTypes as ct
LEFT JOIN test as t ON t.period = p.period AND t.account = a.account AND t.companyType = ct.companyType
GROUP BY p.period, a.account, ct.companyType
ORDER BY a.account, ct.companyType, p.period
Please, check working demo