I have a tree-based structure of SKUs within a Product Hierarchy in SQL Server. The lowest level SKUs will only ever have values (these are consumption values). I then want to generate aggregates up the hierarchy at every level.
Here's is the sample table structure:
Id | ParentId | Name | Volume | IsSku |
---|---|---|---|---|
1 | -1 | All | 0 | 0 |
2 | 1 | Cat A | 0 | 0 |
3 | 1 | Cat B | 0 | 0 |
4 | 2 | Cat A.1 | 0 | 0 |
5 | 2 | Cat A.2 | 0 | 0 |
6 | 3 | Cat B.1 | 0 | 0 |
7 | 3 | Cat B.2 | 0 | 0 |
8 | 4 | SKU1 | 10 | 1 |
9 | 4 | SKU2 | 5 | 1 |
10 | 5 | SKU3 | 7 | 1 |
11 | 5 | SKU4 | 4 | 1 |
12 | 6 | SKU1 | 10 | 1 |
13 | 6 | SKU2 | 5 | 1 |
14 | 7 | SKU3 | 9 | 1 |
15 | 7 | SKU4 | 7 | 1 |
I need a query that will start at the sku level (IsSku=1) and then working up, will sum the SKUs and carry the sum up the product category levels to get a cumulative running total.
I've seen several queries where there are recursive sums in a hierarchical structure where each level already has values, but I need one that will start at the lowest level that has values and recursively calculate the sum as it moves upward.
I was trying these, but they look like they are mainly summing hierarchical data where each node already has a value (Volume, in my case). I need to start at the lowest level and carry the aggregate up as I go up the hierarchy. I tried to emulate the answers in these posts with my data, but wasn't successful so far with my data setup.
The output for the query should be like so:
Id | ParentId | Name | Volume | IsSku |
---|---|---|---|---|
1 | -1 | All | 54 | 0 |
2 | 1 | Cat A | 26 | 0 |
3 | 1 | Cat B | 28 | 0 |
4 | 2 | Cat A.1 | 15 | 0 |
5 | 2 | Cat A.2 | 11 | 0 |
6 | 3 | Cat B.1 | 12 | 0 |
7 | 3 | Cat B.2 | 16 | 0 |
8 | 4 | SKU1 | 10 | 1 |
9 | 4 | SKU2 | 5 | 1 |
10 | 5 | SKU3 | 7 | 1 |
11 | 5 | SKU4 | 4 | 1 |
12 | 6 | SKU1 | 10 | 1 |
13 | 6 | SKU2 | 2 | 1 |
14 | 7 | SKU3 | 9 | 1 |
15 | 7 | SKU4 | 7 | 1 |
I've got a start with a recursive CTE that returns the hierarchy can can aggregate the volume if that node has volume already, but can't seem to figure out how to start at the SKU levels and continue aggregating up the hierarchy.
Here's the start with my CTE:
DECLARE @tblData TABLE
(
[ID] INT NOT NULL,
[ParentId] INT NULL,
[Name] varchar(50) NOT NULL,
[Volume] int NOT NULL,
[IsSku] bit
)
INSERT INTO @tblData
VALUES
(1,-1,'All',0,0)
,(2,1,'Cat A',0,0)
,(3,1,'Cat B',0,0)
,(4,2,'Cat A.1',0,0)
,(5,2,'Cat A.2',0,0)
,(6,3,'Cat B.1',0,0)
,(7,3,'Cat B.2',0,0)
,(8,4,'SKU1',10,1)
,(9,4,'SKU2',5,1)
,(10,5,'SKU3',7,1)
,(11,5,'SKU4',4,1)
,(12,6,'SKU1',10,1)
,(13,6,'SKU2',5,1)
,(14,7,'SKU3',7,1)
,(15,7,'SKU4',4,1)
;WITH cte AS (
SELECT
a.ID
,a.ParentID
,a.Name
,a.Volume
,CAST('/' cast(ID as varchar) '/' as varchar) Node
,0 AS level
,IsSku
FROM @tblData AS a
WHERE a.ParentID = -1
UNION ALL
SELECT
b.ID
,b.ParentID
,b.Name
,b.Volume
,CAST(c.Node CAST(b.ID as varchar) '/' as varchar)
,level = c.level 1
,b.IsSku
FROM @tblData AS b
INNER JOIN cte c
ON b.ParentId = c.ID
)
SELECT c1.ID, c1.ParentID, c1.Name, c1.Node
,ISNULL(SUM(c2.Volume),0)
FROM cte c1
LEFT OUTER JOIN cte c2
ON c1.Node <> c2.Node
AND LEFT(c2.Node, LEN(c1.Node)) = c1.Node
GROUP BY c1.ID, c1.ParentID, c1.Name, c1.Node
Any help is appreciated!
CodePudding user response:
This should do it:
DECLARE @tbl TABLE(Id INT, ParentId INT, Name NVARCHAR(255), Volume INTEGER, IsSku BIT)
INSERT INTO @tbl
VALUES
(1,-1,'All',0,0)
,(2,1,'Cat A',0,0)
,(3,1,'Cat B',0,0)
,(4,2,'Cat A.1',0,0)
,(5,2,'Cat A.2',0,0)
,(6,3,'Cat B.1',0,0)
,(7,3,'Cat B.2',0,0)
,(8,4,'SKU1',10,1)
,(9,4,'SKU2',5,1)
,(10,5,'SKU3',7,1)
,(11,5,'SKU4',4,1)
,(12,6,'SKU1',10,1)
,(13,6,'SKU2',5,1)
,(14,7,'SKU3',7,1)
,(15,7,'SKU4',4,1)
SELECT * FROM @tbl
;
WITH cte AS (
SELECT
Id,ParentId, Name, Volume, IsSku, CAST(Id AS VARCHAR(MAX)) AS Hierarchy
FROM
@tbl
WHERE ParentId=-1
UNION ALL
SELECT
t.Id,t.ParentId, t.Name, t.Volume, t.IsSku, CAST(c.Hierarchy '|' CAST(t.Id AS VARCHAR(MAX)) AS VARCHAR(MAX))
FROM
cte c
INNER JOIN @tbl t
ON c.Id = t.ParentId
)
SELECT Id,ParentId, Name, ChildVolume AS Volume, IsSku
FROM (
SELECT c1.Id, c1.ParentId, c1.Name, c1. Volume, c1.IsSku, SUM(c2.Volume) AS ChildVolume
FROM cte c1
LEFT JOIN cte c2 ON c2.Hierarchy LIKE c1.Hierarchy '%'
GROUP BY c1.Id, c1.ParentId, c1.Name, c1. Volume, c1.IsSku
) x
Basically the computation happens in three steps:
Capture the Hierarchy recursively for each descendant by concatenating the Ids:
CAST(c.Hierarchy '|' CAST(t.Id AS VARCHAR(MAX)) AS VARCHAR(MAX))
Join the resulting table with itself so each record is joined with itself and all its descendants:
FROM cte c1 LEFT JOIN cte c2 ON c2.Hierarchy LIKE c1.Hierarchy '%'
Finally aggregate the Volume of each hierarchy by grouping:
SUM(c2.Volume) AS ChildVolume
This is in reference to Ed Harper's answer to a similar question here: Hierarchy based aggregation
CodePudding user response:
Due to the way that recursive CTEs work in SQL Server, it is very difficult to get this kind of logic working efficiently. It often either requires self-joining the whole resultset, or using something like JSON or XML.
The problem is that at each recursion of the CTE, although it appears you are working on the whole set at once, it actually only feeds back one row at a time. Therefore grouping is disallowed over the recursion.
Instead, it's much better to simply recurse with a WHILE
loop and insert into a temp table or table variable, then read it back to aggregate
Use the
OUTPUT
clauses to view the intermediate results
DECLARE @tmp TABLE (
Id INTEGER,
ParentId INTEGER,
Name VARCHAR(7),
Volume INTEGER,
IsSku INTEGER,
Level INT,
INDEX ix CLUSTERED (Level, ParentId, Id)
);
INSERT INTO @tmp
(Id, ParentId, Name, Volume, IsSku, Level)
-- OUTPUT inserted.Id, inserted.ParentId, inserted.Name, inserted.Volume, inserted.IsSku, inserted.Level
SELECT
p.Id,
p.ParentId,
p.Name,
p.Volume,
p.IsSku,
1
FROM Product p
WHERE p.IsSku = 1;
DECLARE @level int = 1;
WHILE (1=1)
BEGIN
INSERT INTO @tmp
(Id, ParentId, Name, Volume, IsSku, Level)
-- OUTPUT inserted.Id, inserted.ParentId, inserted.Name, inserted.Volume, inserted.IsSku, inserted.Level
SELECT
p.Id,
p.ParentId,
p.Name,
t.Volume,
p.IsSku,
@level 1
FROM (
SELECT
t.ParentID,
Volume = SUM(t.Volume)
FROM @tmp t
WHERE t.Level = @level
GROUP BY
t.ParentID
) t
JOIN Product p ON p.Id = t.ParentID;
IF (@@ROWCOUNT = 0)
BREAK;
SET @level = 1;
END;
SELECT *
FROM @tmp
ORDER BY Id;
This solution does involve a blocking operator, due to Halloween protection (in my case I saw an "unnecessary" sort). You can avoid it by using Itzik Ben-Gan's Divide and Conquer method, utilizing two table variables and flip-flopping between them.