Home > database >  Bottom Up Recursive SUM (lowest level only has values)
Bottom Up Recursive SUM (lowest level only has values)

Time:10-12

I have a tree-based structure of SKUs within a Product Hierarchy in SQL Server. The lowest level SKUs will only ever have values (these are consumption values). I then want to generate aggregates up the hierarchy at every level.

Here's is the sample table structure:

Id ParentId Name Volume IsSku
1 -1 All 0 0
2 1 Cat A 0 0
3 1 Cat B 0 0
4 2 Cat A.1 0 0
5 2 Cat A.2 0 0
6 3 Cat B.1 0 0
7 3 Cat B.2 0 0
8 4 SKU1 10 1
9 4 SKU2 5 1
10 5 SKU3 7 1
11 5 SKU4 4 1
12 6 SKU1 10 1
13 6 SKU2 5 1
14 7 SKU3 9 1
15 7 SKU4 7 1

I need a query that will start at the sku level (IsSku=1) and then working up, will sum the SKUs and carry the sum up the product category levels to get a cumulative running total.

I've seen several queries where there are recursive sums in a hierarchical structure where each level already has values, but I need one that will start at the lowest level that has values and recursively calculate the sum as it moves upward.

I was trying these, but they look like they are mainly summing hierarchical data where each node already has a value (Volume, in my case). I need to start at the lowest level and carry the aggregate up as I go up the hierarchy. I tried to emulate the answers in these posts with my data, but wasn't successful so far with my data setup.

The output for the query should be like so:

Id ParentId Name Volume IsSku
1 -1 All 54 0
2 1 Cat A 26 0
3 1 Cat B 28 0
4 2 Cat A.1 15 0
5 2 Cat A.2 11 0
6 3 Cat B.1 12 0
7 3 Cat B.2 16 0
8 4 SKU1 10 1
9 4 SKU2 5 1
10 5 SKU3 7 1
11 5 SKU4 4 1
12 6 SKU1 10 1
13 6 SKU2 2 1
14 7 SKU3 9 1
15 7 SKU4 7 1

I've got a start with a recursive CTE that returns the hierarchy can can aggregate the volume if that node has volume already, but can't seem to figure out how to start at the SKU levels and continue aggregating up the hierarchy.

Here's the start with my CTE:

DECLARE @tblData TABLE
(
    [ID] INT NOT NULL,
    [ParentId] INT NULL,
    [Name] varchar(50) NOT NULL,
    [Volume] int NOT NULL,
    [IsSku] bit
)

INSERT INTO @tblData
VALUES 
 (1,-1,'All',0,0)
,(2,1,'Cat A',0,0)  
,(3,1,'Cat B',0,0)  
,(4,2,'Cat A.1',0,0)  
,(5,2,'Cat A.2',0,0)  
,(6,3,'Cat B.1',0,0)  
,(7,3,'Cat B.2',0,0)  
,(8,4,'SKU1',10,1)  
,(9,4,'SKU2',5,1)  
,(10,5,'SKU3',7,1)  
,(11,5,'SKU4',4,1)  
,(12,6,'SKU1',10,1)  
,(13,6,'SKU2',5,1)  
,(14,7,'SKU3',7,1)  
,(15,7,'SKU4',4,1)  

;WITH cte AS (   
    SELECT
        a.ID
        ,a.ParentID
        ,a.Name
        ,a.Volume
        ,CAST('/'   cast(ID as varchar)   '/' as varchar) Node
        ,0 AS level
        ,IsSku
    FROM @tblData AS a
    WHERE a.ParentID = -1

    UNION ALL

    SELECT
        b.ID
        ,b.ParentID
        ,b.Name
        ,b.Volume
        ,CAST(c.Node   CAST(b.ID as varchar)   '/' as varchar)
        ,level = c.level   1
        ,b.IsSku
    FROM @tblData AS b  
    INNER JOIN cte c
        ON b.ParentId = c.ID
)

SELECT c1.ID, c1.ParentID, c1.Name, c1.Node
    ,ISNULL(SUM(c2.Volume),0)
FROM cte c1
LEFT OUTER JOIN cte c2
    ON c1.Node <> c2.Node
    AND LEFT(c2.Node, LEN(c1.Node)) = c1.Node
GROUP BY c1.ID, c1.ParentID, c1.Name, c1.Node

Any help is appreciated!

CodePudding user response:

This should do it:

DECLARE @tbl TABLE(Id INT, ParentId INT, Name NVARCHAR(255), Volume INTEGER, IsSku BIT)
 
INSERT INTO @tbl
VALUES 
 (1,-1,'All',0,0)
,(2,1,'Cat A',0,0)  
,(3,1,'Cat B',0,0)  
,(4,2,'Cat A.1',0,0)  
,(5,2,'Cat A.2',0,0)  
,(6,3,'Cat B.1',0,0)  
,(7,3,'Cat B.2',0,0)  
,(8,4,'SKU1',10,1)  
,(9,4,'SKU2',5,1)  
,(10,5,'SKU3',7,1)  
,(11,5,'SKU4',4,1)  
,(12,6,'SKU1',10,1)  
,(13,6,'SKU2',5,1)  
,(14,7,'SKU3',7,1)  
,(15,7,'SKU4',4,1)  
SELECT * FROM @tbl
;

WITH cte AS (
    SELECT       
        Id,ParentId, Name, Volume, IsSku, CAST(Id AS VARCHAR(MAX)) AS Hierarchy
    FROM       
        @tbl
    WHERE ParentId=-1
    UNION ALL
    SELECT 
        t.Id,t.ParentId, t.Name, t.Volume, t.IsSku, CAST(c.Hierarchy   '|'   CAST(t.Id AS VARCHAR(MAX)) AS VARCHAR(MAX)) 
    FROM 
        cte c 
        INNER JOIN @tbl t
            ON c.Id = t.ParentId

)
SELECT Id,ParentId, Name, ChildVolume AS Volume, IsSku
FROM (
    SELECT c1.Id, c1.ParentId, c1.Name, c1. Volume, c1.IsSku, SUM(c2.Volume) AS ChildVolume
    FROM cte c1
        LEFT JOIN cte c2 ON c2.Hierarchy LIKE c1.Hierarchy   '%'
    GROUP BY c1.Id, c1.ParentId, c1.Name, c1. Volume, c1.IsSku
) x

Basically the computation happens in three steps:

  1. Capture the Hierarchy recursively for each descendant by concatenating the Ids: CAST(c.Hierarchy '|' CAST(t.Id AS VARCHAR(MAX)) AS VARCHAR(MAX))

  2. Join the resulting table with itself so each record is joined with itself and all its descendants: FROM cte c1 LEFT JOIN cte c2 ON c2.Hierarchy LIKE c1.Hierarchy '%'

  3. Finally aggregate the Volume of each hierarchy by grouping: SUM(c2.Volume) AS ChildVolume

This is in reference to Ed Harper's answer to a similar question here: Hierarchy based aggregation

CodePudding user response:

Due to the way that recursive CTEs work in SQL Server, it is very difficult to get this kind of logic working efficiently. It often either requires self-joining the whole resultset, or using something like JSON or XML.

The problem is that at each recursion of the CTE, although it appears you are working on the whole set at once, it actually only feeds back one row at a time. Therefore grouping is disallowed over the recursion.

Instead, it's much better to simply recurse with a WHILE loop and insert into a temp table or table variable, then read it back to aggregate

Use the OUTPUT clauses to view the intermediate results

DECLARE @tmp TABLE (
  Id INTEGER,
  ParentId INTEGER,
  Name VARCHAR(7),
  Volume INTEGER,
  IsSku INTEGER,
  Level INT,
  INDEX ix CLUSTERED (Level, ParentId, Id)
);

INSERT INTO @tmp
  (Id, ParentId, Name, Volume, IsSku, Level)
-- OUTPUT inserted.Id, inserted.ParentId, inserted.Name, inserted.Volume, inserted.IsSku, inserted.Level
SELECT
  p.Id,
  p.ParentId,
  p.Name,
  p.Volume,
  p.IsSku,
  1
FROM Product p
WHERE p.IsSku = 1;

DECLARE @level int = 1;
WHILE (1=1)
BEGIN
    INSERT INTO @tmp
      (Id, ParentId, Name, Volume, IsSku, Level)
    -- OUTPUT inserted.Id, inserted.ParentId, inserted.Name, inserted.Volume, inserted.IsSku, inserted.Level
    SELECT
      p.Id,
      p.ParentId,
      p.Name,
      t.Volume,
      p.IsSku,
      @level   1
    FROM (
        SELECT
          t.ParentID,
          Volume = SUM(t.Volume)
        FROM @tmp t
        WHERE t.Level = @level
        GROUP BY
          t.ParentID
    ) t
    JOIN Product p ON p.Id = t.ParentID;

    IF (@@ROWCOUNT = 0)
        BREAK;
        
    SET @level  = 1;
END;

SELECT *
FROM @tmp
ORDER BY Id;

db<>fiddle

This solution does involve a blocking operator, due to Halloween protection (in my case I saw an "unnecessary" sort). You can avoid it by using Itzik Ben-Gan's Divide and Conquer method, utilizing two table variables and flip-flopping between them.

  • Related